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REVIEW ARTICLE 

Sequence Alignment of the G-Protein 
Coupled Receptor Superfamily 

WILLIAM C PROBST/4 LENORE A. SNYDER,*.§ DAVID I. SCHUSTER,* JURGEN BROSIUS,* 

and STUART C. SEALFON*.t 

ABSTRACT . 

The multitude of G-protein coupled receptor (GPR) superfamily cDNAs recently isolated has exceeded the 
nunTr o receptor subtypes anticipated by pharmacological studies. Analysis of the sequence simdanUes 
^unique features of the members of this family is valuable for designing strategies to isolate rented 
cDNAs, for developing hypotheses concerning substrate-ligand and receptor-effector mteractions, and for 
undertanding the evolution of these genes. We have compiled and aligned the 74 un.que ammo a d s^ 
quences published to date and review the present understanding of the structural rnoffs contnbuung to 
ligand binding and G-protein coupling. 

INTRODUCTION The number of GPRs that have been cloned is increasing 

rapidly; at present 74 distinct GPR sequences have been 
rpHE clonic of a great number of receptors and chan- published. GPR cloning has led to the stable : high -level ex- 
Tnels has revealed that many of these critical membrane pression of these receptor subtypes >n 1 tones, 
proteins can be grouped into gene superfamilies based on a preparation that has greatly a.ded the Ph™log,cal 
sequent and statural similarities. One of these super- characterization of these receptors^ Molecular b 
falilies comprises the G-protein coupled receptors teration of receptor sequences and express.cn .n cell nes 
S5 Although the signal transduction mechanism is has provided much of our knowledge concerning the func- 
not known for all members of the gene family, in most tional role of particular receptor regions and ™*"<*- 
"2es rStor stimulation induces activation of a guanine . We have aligned all the available.am.no acid sequences 
nuc^Tde bllng protein or G-protein. In 1982 the com- of the members of this family (Fig. 2). Th.s comp.lat.on 
Sn^equ^o. of the visual pigment bovine rho- should prove useful for designing 
Sopsinwas determined (Ovchinnikov e, al., 1982). Its pre- other GPRs. Indeed, many C ™>™™&*™^ ^ 
dieted structure, containing an extracellular amino terminus mme receptors (Bunzow et al 1988. Dea^y « «/.^9*9. 
and seven hydrophobic membrane spanning a-helices (Har- the adenosine receptors (Libert e ^^^^ 
grave et al., 1983). was remarkably similar to that previously nabinoid receptor (Matsuda et al., 1990) have been cloned 
Ltified by electron diffraction and sequence analysis for * approaches relying on "^T^^J^: 
bacteriorhodopsin (Unwin and Henderson. 1975; Engelman ^sequence ahgnment may r ^>^ e ' h ^™^ ° n ° f 
etal 1980). The subsequent molecular cloning of four hu- hypotheses concerning the role of certain protein se- 
man'opsins (Nathans and Hogness. 1984; Nathans et al., quences in determining ligand ^^^ ^ 
1986) and the hamster ^-adrenergic receptor (Dixon et at.. G-protein specificity of the receptors. Companson of the 
198© again revealed these structural features that have be- structure of the genes for these -receptors can provide in- 
come the hallmark of this gene family (Fig. 1). sight into the evolution of this gene family. 
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iff O^SuSKKl «rf Unlvenhy Center of the City University of New York. Ph.D. Program m B.ology, New York. NY 
100364099. 
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CYTOPLASMIC 

FIG- 1. The topography of G-protein linked receptors. 
Cylinders represent transmembrane a-helices. Extracellular 
and cytoplasmic sides of the plasma membrane are indi- 
cated. 

The sequences were aligned manually, relying on invari- 
ate residues and published computer-generated sequence 
alignments. Several of the sequences, such as the FC5R re- 
ceptor, have not yet been proven to represent GPRs. They 
are included in the alignment, however, because their se- 
quences identify them as members of this superfamily. If 
sequences for the same receptor subtype of more than one 
species have been published, we have included only the se- 
quence of the highest species. The sequences are organized 
into subgroups based on ligand type, i.e., muscarinic re- 
ceptors, catecholamine receptors, etc. 



GENERAL STRUCTURAL FEATURES 

All of the proteins are single polypeptide chains. The 
shortest sequence represents the rat mas oncogene (324 
amino acids) and the longest sequence represents the hu- 
man thyroid-stimulating hormone receptor (744 amino 
acids). The predicted protein structures contain seven 
stretches of 20-3.0 hydrophobic amino acids which are be- 
lieved to form membrane-spanning a-helices. These helices 
are referred to as transmembrane domains 1-7 (TM 1- 
TM 7). This predicted structure, based on hydropathy 
analysis, has been supported by electron diffraction analy- 
sis for bacteriorhodopsin (Henderson et aL, 1990) and pro- 
teolytic cleavage studies for rhodopsin and the 0,-adre- 
nergic receptor (Hargrave et aL, 1982; Dohlman et aL, 
1988). The proteins have extracellular amino termini and 
cytoplasmic carboxyl termini. 

The areas of greatest homology among the GPRs are in 
the seven transmembrane regions. Some residues are found 
in virtually all GPRs and may mediate the tertiary struc- 
ture required for functional activity (Hulme et aL, 1990; 
Hibert et aL, 1991). Particularly well conserved are several 
proline residues in TM 4, 5, 6, and 7. These residues most- 
likely introduce kinks in the a-helices and may be impor- 
tant in the formation of the binding pocket (Applebury 
and Hargrave, 1986; Findiay and Eliopoulos, 1990; Dahl 
et aL, 1991; Hibert et aL, 1991). Other well-conserved resi- 
dues include a glycine, an asparagine, and a valine in TM 
1; a leucine, two alanines, and an aspartate in TM 2; an 
isoleucine in TM3; a tryptophan in TM 4; a phenylalanine 



and a tryptophan in TM 6; and an asparagine and a tyro- 
sine in TM 7 (see Fig. 3). Certain conserved residues are re- 
placed in particular subfamilies. For example, the TM 6 
conserved tryptophan is replaced by methionine in the gly-. 
coprotein hormone receptors (Fig. 2). 

Most GPRs have single conserved cysteine residues in 
each of the first two extracellular loops that are believed to 
form a disulfide bond that stabilizes the functional protein 
structure (see Fig. 3). Mutation of either of these con- 
served cysteine residues markedly alters the function of 
rhodopsin, muscarinic, and 0-adrenergic receptors (Dixon 
et aL, 1987a; Karnik et aL, 1988; Fraser, 1989; Hulme et 
aL, 1990). The most highly conserved intracellular se- 
quence is the aspartate-arginine-tyrosine triplet adjacent to 
TM 3 which has been implicated in signal transduction (see 
below). The arginine of this triplet is invariant, and the as- 
partate and tyrosine are conservatively replaced in several 
GPRs. 

The amino termini of these proteins vary greatly in 
length. They range from as few as seven residues in the 
adenosine A, receptor to over 300 residues for the glyco- 
protein hormone receptors. Overall, there is little sequence 
homology among the receptors in the first extracellular do- 
main. The amino termini of nearly ail the GPRs contain 
consensus sequences (N-X-S/T) for TV-glycosylation (Korn- 
feld and Komfeld, 1985). Rhodopsin, the a 2 -adrenergic, 
the 0,-adrenergic, and the 0 2 -adrenergic receptors are all 
glycosylated at several of these sites (Hargrave, 1977; 
Strasser et aL, 1984; Benovic et aL, 1987b; Dohlman et aL, 
1987; Regan, 1988). Glycosylation may contribute to the 
proper expression of membrane proteins (for review, see 
Kornfeld and Kornfeld, 1985). Deletion of the glycosylated 
domains of the 0 2 -adrenergic receptor decreased the level 
of receptor expression but did not alter ligand binding 
(Dixon et at., 1987b). Inhibition of glycosylation dimin- 
ished muscarinic receptor expression (Liles and Nathan- 
son, 1986). The thyroid-stimulating hormone receptor con- 
tains six potential glycosylation sites. Mutational analysis 
demonstrated that two of these sites are required for the 
expression of functional receptor (Russo et aL, 1991). 
Some receptors with short amino termini (the Ai and A, 
adenosine and the Gt2B-adrenergic receptors) do not con- 
tain amino-terminal asparagine glycosylation sites. 

Phosphorylation and palmitoylation of carboxy-termi- 
nal sites can influence the signal transduction of some 
GPRs. Most GPRs contain potential phosphorylation sites 
in the third cytoplasmic loop and/or carboxyl terminus. 
For several receptors, phosphorylation by protein kinase A 
and specific receptor kinases mediates receptor desensitiza- 
tion (see Intracellular Coupling below, for discussion). 
Two adjacent cysteine residues in .the carboxyl terminus of 
rhodopsin arid one in the 0 r adrenergic receptor are palmi- 
toylated (Ovchinnikov et aL, 1988; O'Dowd et aL, 1989a). 
The hydrophobicity profile of the GPRs predict seven TM 
domains and thus three intracellular loops (Fig. 1). The 
palmitate on carboxy-terminal cysteine(s) would be ex- 
pected to insert into the membrane, thereby forming an 
additional cytosolic loop which may influence receptor 
mobility (Findiay and Eliopoulos, 1990) or G-protein cou- 
pling (O'Dowd et aL, 1989a). 



^-PROTEIN COUPLED RECEPTORS 



3 



FIG. 2.* Amino acid 
sequence alignment of the 
GPR superfamily. The 
putative transmembrane 
domains are enclosed in 
boxes. The precise bounda- 
ries of the TM domains are 
not known with certainty. 
Dashes have been intro- 
duced for the purpose of 
alignment. Amino acids 
omitted from nonconserved 
regions are indicated by 
numbers in parentheses. 
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Human dopamine D5 receptor 
Human dopamine 02 receptor 
Human dopamine 03 receptor 
Human dopamine D4 receptor 



Dlctyostelium cAMP receptor (Klein et al, . 1988) ■ 

Dog adenosine A2 receptor (RDCB) (Libert et al., 1989b) 

Dog adenosine Al receptor (RDC7) (Libert et al., 1989b) 

Human ml muscarinic acetylcholine receptor (Peralta et al., 1987) 

Human m2 muscarinic acetylcholine receptor (Peralta et al., 1987) 

Human m3 muscarinic acetylcholine receptor (Peralta et al., 1987) 

Human m4 muscarinic acetylcholine receptor (Peralta et al., 1987) 

Human mS muscarinic acetylcholine receptor (Bcnner et al., 1988) 

Human beta 1 adrenergic receptor (Frielle et al., 1987) 
Human beta 2 adrenergic receptor (Kobilka et al., 1987a) 
Human beta 3 adrenergic receptor (Emorlne et al., 1989) 
Cow alpha 1 adrenergic receptor (Schwinn et al., 1990) 
Rat alpha IB adrenergic receptor (Voigt, et al., 1990) 
Human. alpha 2 C4 adrenergic receptor (Regan et al., 1988) 
Human alpha 2 C2 adrenergic receptor (Lomasney et al., 1990) 
Human alpha 2 C10 adrenergic receptor (Kobilka et al., 1987c) 
Rat alpha. 2 adrenergic receptor R20 (Lanier et al., 1991) 
Drosophila octopamine receptor (Arakawa et al., 1990) 
Human dopamine Dl receptor (Dearry et al., 1990) 

(Sunahara et al., 1991) 
(Grandy et al., 1989) 
(Giros et al., 1990) 
(Van To! et al., 1991) 
Human serotonin Id receptor [RDC4] ( Hamblln and Met calf, 1991) 
Human serotonin la receptor (Kobilka et al., 1987b) 
Rat serotonin lc receptor (Julius et al., 1988) 
Rat serotonin 2 receptor (Julius et al., 1990) 
Human histamine H2 receptor (Gantz et al., 1991) 

Human N-formyl peptide receptor (Boulay et al., 1990) 
Human C5a anaphylatoxln receptor. (Gerard and Gerard, 1991) 
Human thrombin receptor (Vu et al., 1991) 
Human thromboxane A2 receptor (Hirata et al.,. 1991) 
Human IL-8 receptor (Murphy and Tiffany, 1991) 

Guinea-pig platelet-activating factor receptor (Honda et al, 1991) 
Cow endotheiin 1 receptor (Aral et al., 1990) 

Rat non-isopeptide selective endotheiin receptor (Sakurai et al., 1990) 
Mouse bombesin/gastrin releasing peptide receptor (Splndel et al., 1991) 
Rat neuromedin B preferring bombesin receptor (tfada et al., 1991) 
Human vasoactive intestinal peptide (Sreedharan et al., 1991) 
Rat neurotensin receptor (Tanaka et al., 1990) 
Rat bradykinin receptor (McEachern et al., 1991) 

Mouse thy rot ropin-re leasing hormone receptor (Straub et al., 1990) 

Human neurokinin A (SK) receptor (Gerard et al., 1990) 

Rat substance P receptor (Yokota et al., 1989) 

Rat neuromedin K receptor (Shlgemoto et al., 1990) 

Bovine adrenal angiotensin II type-1 receptor (Sasaki et al. 1991) 

Human mas oncogene . (angiotensin) receptor (xoung et al., 1986) 

Human lutropin-choriogonadotropin receptor (Frailer et al., 1990) 

Human thyrotropin receptor (Libert et al., 1989a) 

Human follicle stimulating hormone receptor (Minegish et al,, 1991) 

Human rhodopsln (Nathans and Hogness, 1984) 
Human green opsin (Nathans et al., 1986) 
Human red opsin (Nathans et al., 1986) 
Human blue opsin (Nathans et al., 1986) 

Odorant receptor F3 (Buck and Axel, 1991) 
Odorant receptor F5 (Buck and Axel, 1991) 
Odorant receptor F6 (Buck and Axel, 1991) 
Odorant receptor F12 (Buck and Axel, 1991) 
Odorant receptor 13 (Buck and Axel, 1991) 
Odorant receptor 17 (Buck and Axel, 1991). 
Odorant receptor 18 (Buck and Axel, 1991) 
Odorant receptor 19 (Buck and Axel, 1991) 
Odorant receptor 114 (Buck and Axel, 1991) 
Odorant receptor 115 '.{Buck and Axel, 1991) 

Human cannabinoid receptor (Matsuda et al., 1990) 

Mouse Clucocprticoid-induced receptor (Harrigan et al., 1991) 

Rat FC5R (Eva et al., 1990) 

Human endothelial cell GPR (Hla and Maciag, 1990) 

Rat testis G-protein coupled receptor 1 (Meyerhof et al. 1991a) 

Rat RGHJP (Meyerhof, DNA and Cell Biology, In press, 1991b). 

Human thoracic aorta GPR (Ross et al., 1990) 

Cytomegalovirus (Human) GPR, US33 (Chee et al., 1990) 

Cytomegalovirus (Human) GPR, US27 (Chee et al., 1990) 

Cytomegalovirus (Human) GPR, US28 (Chee et al., 1990) 
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MGLLDGNPANET 
MSTMGSW 
MPPAISAFOA 



4 MWTSAPPAVSPWITVLAPGKGPWQ 

5 MNNSTNSSNNSLALTSPYKTFE 

6 MTLHNNSTTSSPLFPNISS5WI HSPSDAGLPPGTVTHrGSYKVSRAAG^SShfflGTTDDPLGGHTVWQ 

7 WOTTPVNCSSGNOSVRLVTSSSHNRYETVE 
B MEGDSYHNA-nVNGTPVNHQPLERHRLWE 

9 MGAGVLVLGASEPGNLSSAAP LPDGAATAARLLVPASPPASLLP PASES PEPLSQCW 

10 MAPWPKE^SSUVPWPDLPTIAPNTANTSGLPGVPWE 

U MGQPGNGSAFIXAPNRSHAPDHDVTQQRDEVW 

12 HVFLSGNASDSSNCTHPPPPVNISK 

13 hWPDU)TGHNTSA?AHWGEUCDDNrrePNQTSSNSTli»QIJ)\n^ 

14 MASPAIAAAIAVAAAAGPKASGAGERGSGGVANASGASWGPPRGQY SAGA 

15 MDHQOPYSVQA 

16 WSI£PDACNASWNCTEAPGGGARATPYSLQV 

17 MCSLOPDAGNSSWNGTEAPCCGTRATPYSLQV 
IB MPS AD 01 Ii*VWTTTVAAAALTAAAAV STTK SG NAARGYTD SDDD AO-ETEAV AN I S G S LV EG LTTVTAALS - (35) 

19 ' MRTLNTSAMDGTGLWE RDF SV 

20 MLPPGSNGTAYPGQFALYQQLAQGNAVGGSAGAPPIjG'PS 

21 . MDPLWLSWYDDDLERQNWSRPFNGSDGKADRPH 

22 KA S LSQL5 SHI2J5TCG AENSTGASQARPH 

23 MGNRSTADADCLLAGRGPAAGASAGASACLAGG 

24 ■ MS P LNQS AEG LPQEA5NRS LNATETS EAWNP RTLQAL 

25 MDVLSPGQGNNTTSPPAPFETGGNTXGISDVTVSYQ 

26 MVNUJrftVSUWHIGLLWCFDISISPVAGIVTDTFNSSDGGRLFOFPDGV 

27 MEII£EDhnSI£SIPNSIMQI^CPRLYHroFNSRDAhTO 

28 MAP NGTA SS FC LD STACK 

29 METNSSLPTNI SGGTPA VSAGY LF LD 

30 MM5FNYTTPDYGHYDDKDTL0 LNTPVDKTSNTLRVP 

31 FCTRRIJXVAACFSI^GPLI^ARTKARRPESKATN^^ 

32 WPNGSSLCPCFRPTNITLEERR 

33 MESDSFEDFWKGEDLSNYSYSSTLPPFLLDAAPCEPESLEIN 

34 MELNSSS RVD5EFRYT 

35 KErFWLRLSFWVALVGGVI SDNPESYSTNLSI HVDSVATTHGTELSFVVTTHQP1NLALP SNGSMHNYCPOQTKITSAFK 

36 MQSSASRCGRALVALIJACGLLGWGEKRGFPPAQATPS U/TTK^^ S LMRFRTAZVTKCGRVAGVPP RS FP PPCQRK I EINKTFK 

37 MKAPNNC SHLNLDVDPFLSCNDTFNQS LS PP KMDNWF HPGF 

38 MPP RSLPNLSLPTEAS E5ELEPEVWENDFLPDSDCTTAELV I R 

39 M3LHIi'DYAEPGNFSDlSWPCNSSDCIVWTVMCPNMPNKSVLL 

40 MHUJSSVP<Xrn^EPDAQPFSCPQSEMEATFIAI^I£t«SGOTSESOT^ 

41 KTNI TTQALGSAHNCTS FEVNCPDTEWWS WLN 

42 MENDTVSEhWQTELOPOAAVALEYQVVT 

43 MGTCDIVTEANISSGPESNTTGITAFSMPSWQ 

44 MONVLPMOSDLFPNI STNTSESNQFVQPTWQ 

45 MASVPRGENWTIXnVFAfCTHTGNI^SA 

46 MI LNSSTEDGI KRI QDDCPKAGRHNYI FI 

47 MCGSNVTSFWEEPTNISTCRKASVGNAHRQIP 

4B MKQRFSPWLIJajiLLOAPLPRALRRLCPEPCN- (248) -I^KEI/iFSHSISENFSK^ESTVRKSEI^GTrfDYEYGFCLPKrPRCAPEPDAFNPCEDIMG 

49 MRPADLLQLVLLLDLPRDLGGMGCSSP PCECHQE- (318) -YVFFEEOEDEI ICFGQELK^QEETWFDSHYDYTICGDSEDMVCTPKSDEFNPCEDIMS 

50 MALLLVSLLAFLSLGSGCHHRICHCSNRVFLCOE- (266) -VDYMT^RGQRSSIAE^NESSYSRGFDMTYTXFDYDUINEVVDV^ 

51 MNGTEGPNTYVPFSNATGVVRSPFEYPQYYLAEPWQF 

52 . . hRQQWSWRLAGRHPQDSYEDSTOSSIFTYTNSNSraGPFEGPhTlfHIAPRWVYKLTSVW 

53 hRQQWSWRI^RHPQDSYEDSTOSSIFTYTNSNSTRGPFEGPNYHIAPRWVYHLTSVW 

54 MRKMSEEEFYLFKNISSVGPWDGPQYHAIPVWAFYL 

55 MOSSNRTRVSEFLULGFVENKDLQP 

56 MSSTNQSSVTEFLLLGLSRQPQOQO 

57 MAWSTGQNLSTPGPFILLGFPGPRSMRI 

58 MESGNSTRRFSSFFLLGFTENPQLHF 

59 MNNOTFITOFLLLGLPIPEEHOH 

60 MERRNH SG RV SEFVLLG LPAPAPLRV 

61 MNNKTVITHFLLLGLPIPPEHQO 

62 MTRRNQTAISQFFLLGLPFPPEYQH 

63 : MTCNNQTLILEFLLLGLPIPSEYHL 
■M MTEENQTVISOFLLLFLPIPSEHOH 

65 MKSII^UU)TTFRTirroiXYVGSNDIOYEDiK- (21 ) -SPFQEKMTAGDNSPLVPAOTTTNITEFWKSI^SFKENEENirc^ 

66 KVPPVLLLFLLSSVRATEQPQWTEHPSVTEAALTGPNASSHF^ 

67 hWSTLSFRVENYSVHYhWSENSPFLAFENDDCHLPLAV 

68 MGPTSVPLVKAHRSSVSDYVNYDIIVRHYNYTGKLNISADKENSIK 

69 MKANNTTTSALWLO 

70 MFPNGTAPSPTSSPSSSPGGCGEGVCSRGPGSGAADGMEEPGRNSSONGTLSEGQGS 

71 MAGrrcSWEAHSTNONKMCPGMSEALELYSRGFLTIEQIATLPPPA 
Z? MTGPLFAIR 

73 MTTSTWQTLTC7VSNMTNHTLNSTEIY0LFEYTR 

74 MTPTTTTAEL1TEFOYDEDATPCVFTDVLN0SK 
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SLVLLLFADFSSMLGCMAVLI 

— VYITVELAIAVIAILGNVLVCWAV- 
— AYIGI EVLIALVSVPGNVLVIWAV- 

VAFIGITTGLLSLATVTGNLLVLI SF- 
WFI VLVAGSLSLVTI IGNI LVMVSI- 
WFIAFLTGILALVTI IGNI LVI VSF- 
MVF IATVTGSLS LVTWGNILVML5I - 
VITIAWTAWSLMTIVGNVLVMI SF- 

TAGMGLI/^IVLLIVAGNVLVIVAIA 
WGMGIVMSLIVLAIVFGNVLVITAIA 
AALAGALIALAVLAWGGNLLVIVAIA 
AI LLGV I LGGLI LFGVLGNI LVI LSVA 
AI S VGLVLGAFI LFAI VGNI LVI LSVA 
VAGLAAWGFLIVFTWGNVLWIAVL 
TAAIAAAITFLI LFTI FGNALVI LAVL 
TLTLVCLAGUMLLTVFGNVLVIIAVF 
TLTLVCLAGLLMLFIVFGNVLVI IAVF 
LLTALVLSVI I VL-TI IGNI LVI LSVF 
RI LTAC FLS LLI LSTLLGNTLVCAAV I 
CWTACLLTLLI I WTLLGNV LVCAAI V 
YNY YATLLTLLI AVI VFCNV LVCMAVS 
-AYYALSYCALI LAI VFGNGLVCMAVL 
GAAALVGGVLLI GAVLAGNSLVCVSVA 
KI S LAVVLSVI TLATVLSNAFVLTTI L 
VITSLLLGTLIFC-AVLGKACWAAIA 
QNWPALSIWI I INTIGGNI LVIKAVS 
KNWSALLTTWI ILTIAGNI LVIMAVS 
-ITITWLAVLI LI TVAGNWVCLAVG 

-I I TYLVFAVTFVLGVLGNG LV I WVAG 
DI LALVI FAWF LVGVLGNA LVVWVT A 
TLFVPSVYTGVFWSLPLNI MAI WFI 
-YI NTVI SCriFIVGMVGKATLLRI I Y 
KYFWIIYALVFLLSLLGNSLVMLVIL 
— LFPIVYSIIFVLGIIANGYVXWVFA 
-YI NTI VSCLVFVLGI I GNSTLLRI I Y 
LIASPWFAASFCWGLASNLLALSVLA 
I YV I PAVYGLI I VI GLI GNI TLI KI F - 
CVI PSSLYLIIISVGLLGNTMLVKIF- 
-YTLSF I Y I F I FVI GMI ANS WVWVN I 
KVLVTA I YLALFWGTVGNSVTAFTLA 
AIQAPFLW-VLFLLAALENI FVLSVFC 

-I LLW 1 1 CGLG IVGNIMWLWM 

lALWATAYLALVLVAVTGNA XVI W 1 1 L 
IVLWAAAYTVI WTSWGNWVI WI I L 
I ALWSLAYGLWAVAVFGNLI VIWI I L 
— MI PTLYSI I FWGI FGNS LWIVI Y 
— IVHWVIMSISPVGFVEKGILLWFLC 

YDFLRVLIWLINILAIMGNVMTLFVLL 
YKFLRIWWFVSIXAIXGNVFVLLILL 
YNILRVLIWFISILAITGNI IVLVILT 

SMLAAYM-F LLI VLGF PINT LTLYVTV 

MI FWIASVF-TNGLVLAATW 

MI FVVTASVF-TNGLVLAATM 



— OAAFM-GTVFLI CF P LNAMVLVATL 

- — LIYGLFLSMYLVTVIGNI SI I VAI I 
— LLFLLFLIMYLATVLGhTLLIILAlG 
— GLFLLFLVMYLLTWGMLAI I SLVG 
— LIFALFLSMYLVTVLGNLLIIMMI 
— LFYALFLVMY LTTI LGMLLI I VLVQ 
— LLFFLSLLXYVLVLTENMLI HAIR 
— LFFALFLIMYLTTFLGNLLIWLVQ 
— LFYALFLAMYLTTLLGNLI I I I LI L 
— LFYALFLAMYLTI I LGNLLI I VLVR 
— VFYALFLSMYLTTVLGNLI IIILIH 

-LAI AVLSLTLGTFTVLENLLVLCVI L 
KALLI VAYSFTI VFSLFGNVLVCHY I F 
I FTLALAYGAVI I LGVSGNLALI 1 1 1 L 
-LTSWFIL-ICCFI ILENI FVLLTIW 
I IY YI TMEAAIGLCAWGNMLVI WW- 
AILI SFI YSWC LVGLCGNSMVI YVI L 
— VTNY I FL LLC LCG LVGNG LVLWFFC 
-TTEAVLNTFH FVGGPLNA I VL I TOL 

-LGVWLMCIVGTFLNVLVITTIL 

-PVTLF LYGWF LFGSI GNF LVI FTI T 



GFWRLKLLRNHVTK 

WLNSNLQNVTN 

— rKVNQALRDATF 

KVNTELKTVKN 

KVNRHLQTVNN 

KVNKQLKTVNN 

-> — KVNRQLCTVNN 
KVNSQLKTVNN 

KTPR-LQTLTN 

KFER-LQTVTN 

WTPR-LOTMTN 

-CHRHLHSVTH 

CNRHLRTPTN 

TSRALRAPQN 

T5R5LRAPQN 

TSRALKAPQN 

TSRALKAPQN 

TYKPLRIVQN 

RFRHLR5KVTN 

RSRHLRANMTN 

REKALQTTTN 

REKALQTTTN 

TERALQTPTN 

LTRKLHTPAN 

LERSLQNVAN 

MEKKLHNATN 

LEKKLQNATN 

LNRRLRNLTN 



FRMTHTVTT 

— FEAKRTI NA- 
— LKMKVKKPAV 
~ QNKCMRNGPN 
— YSRGVRSVTD 



— RLYPSKKNEIK- 

KNKCMRNGPN 

GARQSS SHTRS SFL 

■ CTVKSMRNVPN 

LTNSTMRSVPN 

QAKTTCYDTH 

RKKSLQS LQSTVH- 

LHKTNCTVAE 

RTKHMRTPTN 

AHRRMRTVTN 



AHKRMRTVTN 

AHKRMRTVTN 

FYMKLKTYAS 

FRMRRNPF — 

TSRYKLTVPR 

TSHYKLNVPR 

TS.QYKLTVPR 

QHKKLRTPLN 

KFKKLRHPLN 

KFKKLRHPLN 



AYKKLRQPLN 

SDPCLHTPM- 

GDSRLHTPM- 

AHRCLQPMT- 

TQSHLHTPM- 

LDSQLHTPM- 

NHPTLHKPM- 

— : — LDSHLHTPM- 

LDSHLHTPM- 

LDSHLHMPM- 

LDSHLHTPM- 

HSRSLRCRPSY 

KNQRMHSATS 

KOKEMRNVTN 

— KTKKFHRPMY — 
— KLNRTLRmT- 

RYAKMKTATN 

FSIKRTPFSIY 

LTNRVLCY-STPT- 

YYRRKKKSPSD 

-WRRRIOCSGD 



-V I ACFCATSFC KD FPST I LTLT- 
-Y FWSLAAAD I AVGVLA I PFAI T 
-CFIVSLAVAOVAVGALVIPLAIL 

-Y FLLS LACADLI IGTFSMNLYTT 
-YFLFSLACADLIIGVFSMNLYTL 
-Y FLLS LACADLI IGVI SMNLFTT 
-Y FLFSLACADLI IGAFSMNLYTV 
-YYLLSIACADLI IG I FSMNLYTT 

-LFIMSLASADLVMGLLVVPFGAT 
-YFITSLACADLVMGLAWPFGAA 
-V FVTS LAAAD LVMG LLWP PAAT 
-YYIVNLAVADLLLTSTVLPFSAI 
-YFIVNLAIADLLLSFTVLPFSAT 
-LFLV S LASAD I LVATLVMP F S LA 
-LFLVSLAAADILVATHXPFSLA 
-LFLVS LASAD I LVATLVIPFSLA 
-LFLVSLASADILVATLVIPFSLA 
-FFIVSLAVADLTVALLVLPFNVA 
-FFVI SLAVSDLLVAVLVMPWKAV 
-VFIVSLAVSDLFVALLVMPWKAY 
-YLrVSIAVADLLVATLVMPWVVY 
-Y LWS LAVAD LLVATLVMPWWY 
-SFIVSIAAADLLLALLVLPLFVY 
-YLIGS LATTDLLVS I LVMP I S I A 
-YLICSIAVTDLMVSVLVLPMAAL 
-YFLMSLAIADMLVGFLVMPLSLL 
-YFLMSIAIAOMLLGFLVMPVSML 
-CFIVSLAITDLLLGLLVLPFSAI 

-I SYLNLAVAD FCFTSTLPFFMVR 
-IWFLNLAVADFLSCIALPI LFTS 
-VYMLHLATADVLFVSVLPFKI SY 
-ALIASLALGDLIYWIDLPINVP 
-VYLLNLALAD LLFALTLPI WAAS 
-I FMN^TVADLLFLITLPLWI VY 
-I LIASLALGDLLHI I IDIPINAY 
-TFLCGLVLTDFLGLLVTGTIWS 
-LFI S S1ALGDLLLLVTCAPVDAS 
-IFISKIJW^mXLTCVPVDAS 
-CYI LNLAIADLWWLTIPVWWS 
-YHIGSLALSDLLILLLAMPVELY 
-IYLCNLASADLILACGLPFWAIT 
-CYLVSLAVADLMVLVAAGLPNIT 
-Y FI VNLAIADLCMAAFNAAFNFV 
-YFLVWLAFAEACMAAFNTVVNFT 
-YFLVNIAFSDASVAAFNTLINFI 
-VFLUnALADLCFLLTLPLWAVY 
-TVYTHLS IADI SLLFCI FI LSID 

-FLMCNLSFADFCMGLYLLLIASV 
-FlfCNLAFADFCMGMYLLLIASV 
-FLMCNLAFADLCIGIYLLLIASV 

-YI LLNLAVADLFMVLGGFTSTLY 
-W I LVNLAVADLAGTVI A5TI SW 
-WILVNIAVADLAGTVIASTI SIV 
-Y ILVNVS FGGF LLC I FSVFP VFV 

YFFLS NLS FVDI CF I STTVP 

YF FLS NLS FVDVCFS STTVP 

YFFLCNLSFLEIWFTTACVP 

YFFLANLS FVDI CFTSTTI 9 

YLFLS NLS FSDLCFS SVTMP 

YFFLAWSFLEIWYVTVTIP 

YLFLS NLSFSDLCFSSVTML . 

YLFLSNLSFADLCFS SVTMP 

YLFLSNLS FSDLCFS SVTMP 

YLFLS NLS FSDLCFS SVTMP 

-HFIGSLAVADLLGSVIFVYSFVD 
-LFIVNLAVADIMITLLNTPFTLV 
-ILIVNLSFSDLLVAVMCLPFTFV 
-YFIGNLALSDLLAGVAYTANLLL 
-YFIVSLALADIAVGVLVIPLAIA 
-I YILNLAIADELLMLSVPFLVTS 
-IYFLHLASADGIYLFSKAVIALL 
-I YMTNLYSTHFLTLTVLPFIVLS 
-TY ICNLAVADLLI VVGLPFF LEY 
-VYFINIAAADLLFVCTLPLWMQY 



-NTAVNGCF PC YLYA 

— I STGFCAAC HN -CL 

— INIGPRTYFHT CL 




-YLI/CH-WALGTIA 
-YTVIGY-WPLCPW 
-YI IMNR-WALGNLA— 
-YIIKGY-WPLGAW— 
- YI LMGR-WALGSLA CD 

-LWWGR-WEYGSFF- 
-HILMKM-WTFGNFW- 

-LALTGH-WPLGATG CE 

-FEILCY-WAFCRVF CN 

-LEVLG Y-WVLGRI F CD 

-NELMAY-WYFGOVW CG 

-NE LLG Y-WY F RRTW CE 

-NEVNGY-WYFGKTW — CE 



-KEVMGY-WYFGKVW CE 

-YS I LGR-WEFGI HL CK 

-AE I AG F- WPFGS F rCN 

-AEVACY-WPFGAF CD 

-CD 



-LEWG E-WKF SRI H- 
-LEVTGGVWNFSRIC- 

-SEVQGAAWLLSPRL CD 

- YTITHT-WNFGQ I L CD 

-YQVLNK-WTLGQVT— —CD 

-AI LYDYVWP LPRYL CP 

-TI LYGYRWPLPSKL CA 

-YQLSCK-WSFGKVF CM 



— KAMGGHWPFGWFL CK 

— I VQHHHWPFGGAA CS 



-CR 



— YFSGSDWQFGS EL- 
— KLLAGRWPFEQNDFGVFLCK 
KVNGWIFGTFL— ■ CK 



~Y SNQGNWFLPKFL CN 

— KLLAGDWPF GAEMCK 

-QHAALFEWHAVDPGCRL — CR 

— KYLADRWLFGRIG CK 

— RYFFDEWVFGKLG — 



LVQHNQWPMGELT- 

NFIWVHHPWAFGDAG- 

IANNFDWLFGEVL— 



-DSIYGS-WVYCYVG- 
-— YASHNIWYFGRAF- 

-^YAVHNVWYYGLFY CK 

-YGLHSEWYFGAMY CR 

— TAMEYRWP FGNYL- 
— YALDYELSSGHYYTIV- 



DS QTKGQYY NHAI DWG/TCSG CS 

DLYTHSEYYNHAIDWOTGPG CN 

D I HT KSQ Y H NYA I DWQTGAG CD 

TSLHGYFVFGPTG CN 

— NQVYGYFVLGHPM CV 



-NGVSGYFVLGHPM CV 

-ASCNGYFVFGRHV CA 



-CI 



-CL 



KML VN IGTWNVI TYAG 

KVL ANHI LGSQAI SFSG 

KTL ATFAPRGGVI S LAG CA 

KML VNIYTQSKSITYED CI 

KLL CNMRSQDTSI PYGG— CL 

KLMAGFI GSKENHCQLI SFEA CM 

KLL QNIOSQVPSISYAG CL 

KLL QNM3SCVPSIPYAG CL 

KLL ONMQSCVPSISYTG — CL 

KLL QNMQSCVPSIPFAG CL 

FHVFHRKDSPNV FL 

RFVN STWYFGKGM CH 



— YTLMDHWVFGETM— 



-CK 



— SGATTYKLTPAQWF 

SAWRSRCTSMA— 

TLLRHWPFGALL — 

NMGTFLGSFPDYVRR- 



-CL 
-CR 



NOWLLPAGVAS CK 

AKHHPKLSREW CS 

LLDHNS LAS-VP CT 



6 



PROBST ET AL. 



IVITYGSFACWLWTLCLAISIY I MLIVKREPEPELFEK f— 



1 

2 I FFACFVLVLTQSSI FSLLAIAI I DRY I AI RI PLRYNGLVTGTR 

3 I MVACPVULTQSSIIALLAIAV I DRYLRVKI PLRYKTWTPRR I -AAVAIAGCWILSFVVGLTPLF-CW 



YYY1XCWGLPLISTIVMLA- 
-AKGI IAVCWVLSFAIGLTPML-CW 



4 I LWLALDWAStfASVLNLLLI SF 

5 I IWLALDYWSNASVMNLLIISF 

6 I LWLAIDYVASNASVLNLLVISF 

7 I LWLALDYWSNASVWLLilSF 
B I LWLALDYVASNASVLNLLVISF- 

.9 I IJ^SVDVLCVTAS I ETLGVI AL 

10 I FWTSIDVLCVTASIETLCVIAV 

11 I LWTSVDVLCVTAS I ETLCALAV 

12 I VWAAVDVLCCTASIMGLCirSI 

13 I I WAAVDVLCCTA5 1 LSLCAI S I 

14 I VYLALOVLFCTSSIVHLCAI5L 

15 I VYLALDVLFCTSSI VHLCAI SL 

16 1 1 YLALDVLFCTSSI VHLCAI SL 

17 I I YLALDVLFCTSSI VHLCAI SL 

18 I I>rLTCDVLCCTSSI LNLCAI AL 

19 I IWAFDIMCSTASILNLCVISV 

20 I VWVAFDIMCSTASILNLCVTSV 

21 I IFVTLDVMMCTASI LNLCAI SI 
22- I VFVTLDVTOCTASI LNLCAI S I 

23 I AI/iftMDVMLCTASI FNLCAI SV 

24 I IWLSSDITCCTASILHLCVIAL 

25 I LFIALDVLCCTSSILHLCAIAL 

26 I VWISLDVLFSTASIMHLCAISL 

27 I IWIYLDVLFSTASIMHLCAISL 

28 I I YTSLDVMLCTASI LNLFMI S L 

29 I FLFTI VDI NLFGSVFLI ALI AL 

30 A ilpslillwyasilllatisa' 

31 I FVTAAFYCNMYASTLLMTVISI 

32 I FMCWMIFFGLSPLLLGAAMAS 

33 I WSLLKEVNFYSGILLLACISV 

34 I lagclffintycsVaflgvity 

35 I LFPFLQKSSVGITVLNLCALSV 

36 I LVPFIQKASVGITVLSLCALSI 

37 I LIPFIQLTSVGVSVFTLTALSA 

38 ' I LIPAIQLTSVGVSVPTLTALSA 

39 I VTHLIFSINLFSGIFFLTCMSV 

40 I GYYFLRDACTYATALNVASLSV 

41 I WNTHIYMNLYSSICFLMLVSI 

42 I CITYLQYLGINASSCSITAFTI 

43 I FQNLFPITAMFVSIYSMTAIAA 

44 I FHNFFPIAALFASIYSMTAVAF 

45 I FQNFFPITAVFASIYSM-AXAV 

46 I IASASVSFNLYASVFLLTCLSI 

47 I TLSVTFLFGYNTGLYLLTAISV 



[ ORYFSVTRPLSYRAXRT-PRR- 
DRYFCVTKPLTYPVKRT-TKM-- 
DRYFSITRPLTYRAKRT-TKR — 

i DRYFCVTKPLTYPARRT-TKM-- 
DRYFS ITRPLTYRAKRT-PKR- 

DRYLAITSPFRYQSLLTRAR— 

DRYFAITSPFKYQSLLTKNK 

DRY LAVTNPLRYGALVTKRC- 

DRYIGVSYPLRYPTIVTQKR 

DRYIGVRYSLGYPTLVTRRX- 
D RYWSVTQAVEYNLKRTP RR— 
D RYKAV S RALEYN S KRTP RR— 
DRYWS I TQAI EYNLKRTPRR— 
DRYWSITQAIEYNLKRTPRR- 

D RYWAI TD PI NYAOKRTVGR 

DRYWAI SSPFRYERKMTP-KA — 

DRYWAI 5RPFRYKRKMFQM 

D RY TAVAMPMLYN — TRY S S KRR 
DRYTAWMPVHYQHGTGQSSCRR 

D RFVAVAVPLRYN RQGGSRR 

DRYWAI TD ALEY5 KRRTAGH 

DRYWAI TDPIDYVNXRTPR — 
D RYVAI RNPI EHSRF-SRTK- 
D RYVAI QNP I HHS RFNS RTK— - 
DRYCAVM3 PLRYPVLVTPVR 

DRCVCVLHPVWTONHRTVSLAK- 
DRFLLVFKPIWCQNFRGAGL 

DRF LAWY PMQSL5WRTLGR 

ERYLGITRPFSRPAVASQRR 

D RY LA I VHAT RTLTQKRH LVK — 

NRFQAVKYPI KTAQATTRKR 

DRYRA VASWSRVQGI G I PLV 

DRY RAVASWS RI KGI GVP K — 
DRY KAI VRPMDI QAS HALMK- 
DRYRAIVNPMDMQTSGWL— 
DRY LS I TY FTNTPSS RKKMVRR- 
ERYLAICHPFKAXTLMSRSRTK- 
DRYIALVKTKSMGRMRGVR— 

ERYIAICHPI KAQFLCTFSR r 

DRYMAIVHPFQPRLSAPSTK 

DRYMAI IHPLQPRLSATATK- 
DRYMAI IDPLKPRLSATATK- 

DRYIAIVHPMKSRLRRTML 

ERCLSVLYPIWYRCHRPKY 



48 I TAGFFTVLASELSVYTLTVITL I ERWHTITYAIHLDQKLRLRH- 

49 1 TAGFFTVF ASELSVYTLTVI TL I ERWYAITFAMRLDRXIRLRH- 

50 I AAG FFTVF AS E LSVYTLTA I TL I ERWHTITHIMQLDCKVQLRH- 



51 I LEGFFATLGGE IALWSLWLAI 

52 I LEGYTVSLCGITGLWSLAIISW 

53 I LEGYTVSLCGITGLWSLAIISW 

54 I IXGFLGTVAGLVTGWSLAFLAF 

55 I TQIYFFLLFVELDNFLLTIMAY 

56 I TQLYFLAVFGNM3NFLLAVMSY 

57 I TQMYFVFSLGCTEYFLLAVMAY 

58 I SGMCVFLVFAE LGNFLLAVMAY 

59 I AQTYFFMVFGDMESFLLVAMAY 

60 I TQLYFFLGLGCTECVLLAVMAY 

61 I TOIFFFLLFGYLGNPLLVAMAY 

62 I AQIYFFLFFGDLGNFLLVAMAY 

63 I TQLYFFMVFGDMESFLLWMAY 

64 I TOLYFY LY FADLES FLLVAMAY 

65 I FKLGGVTASFTASVGS LFLTAI 

66 I VSRFAQYCSLHVSALILTAIAV 

67 I LNPFVQCVSITVSIFSLVLIAV 
68- I LREGSMFVALSLSVFSLLAIAI 

69 I FMSCVLLVFTHASIMSLLAIAV 

70 I LVLSVDAVNMFTSIYCLTVLSV 

71 I VSRIVGLCTFFAGVSLLPAISI 

72 I FLSVIYYSSCTVGFATVALIAA 

73 I CLNACFYICLFACVCFLINLSM 

74 | LLTACF YVAMF AS LCFI TE I AL 



ERYVWCKPMSNFRFCEN 

I ERWMWCKPFGNVRFDAK 

1 ERWLWCKPFGNVRFDAK 

I ERYIVICKPFGNFRFSSK 

DRYVAICHPMHYTVIMNYK 

D RYVAI C HPLHYTTKMTRQ 

DRYLAICLPLRYGGIMTPG 

DRYVAXCHPLCYTVIVNHR 

DRYVAICFLPHYTSIMSPK 

| DRYVAICHPLHYPVIVSSR 

DRYVAICFPLHYTNIMSHK 

D RYVAI CFPLHYMSIMSPK 

DRYVAI CFPLRYTTIMSTK 

D RYVAI CFPLHYMSIMSPK 

DRY I S I HRPLAYKRI VTRPK 

DRHOVI MHPLKPRI SITKG 

ERHQLI INPRGWRPNNRH 

ERYITMLKMKLHNGSNNFR 

DRY LRVKLTVRYRTVTTQRR 

DRYVAWHPIKAARYRRPT 

ERCVSVIFPMWYWRRRPKR 

DRYRVLH KRTYARQS YR 

DRYCVIVWGVELNRVRKNKR 

DRYYAIVYMRY RPVK 



-AALMI GLAWLVS FVL-WAPAILFW 
! -AOWIAAAWVLS FI L-WAPAI LFW 
I -AGVMIGLAWVISFVL-WAPAILFW 
-AG LMIAAAWVLS FVL-WAPAILFW 
"AGIMIGLAWLI SFI L-WAPAILCW 

-ARGLVCTVWAISALVSFLPILMHW 
-ARVI I LMVWI VSGLTSF LP I QMHW 
-ARTAWLVWWSAAVSFAPIMSQW 
-GLMALLCVWALSLVI5IGPLF-GW 
-AlLALLSVWVLSTVI SIGPLL-GW 
-VKATIVAVWLI SAVI SFPPLVSLY 
-IKCI I LTVWLIAAVI SLPPLIYKG 
-I KAI I ITVWVISAVI SFPPLISI - 
-I KAI IVTVWVI SAVI SFPPLLI S I 
-VLLLISGVWLLSLLISSPPLI-^W 
-AF I LI SVAWTLSVLI SF I PVQLSW 
-ALVMVGLAWTLS I LI SFI PVQLNW 
-VTVMISIVWVLSFTISC-PLLFGL 
-VALMI TAVWVLA FAVSC-P LLFG F 
-OLLLIGATWLLSAAVAA-PVLCGL 
-AATMIAIVWAISICISIPPLF— W 
-PRALISLTWLIGFLISIPPM-LGW 
-AIMKIAI VWAI S IGVSV-PI PVI G 
-AFLKI IAVWTISVGISM-PIPVFG 
-VAISLVLIWVISITLSFLSI HLCW 

[ KVIIGPWVW^LLL-TLPVII— 

-AWI ACAVAWG LALLL—TI P 5 FLY - 
-ASFTCLAIWALAIAG-V-PLVL— 
-AWATVG LVWAAALALGLLPLL-GV 

FICLSIWCLSLLL-ALPVLL— 

-GIALSLVIWVAIVAA-ASYFLVMM 
-TAIEIVSIWILSFIL-AIPEAICF 
WTAVEIVLIWWSWL-AVPEAIGF 
-ICLKAALIWI VSMLL-AI PEAVF- 
-WTSVAVGIWVVSVLL-AVPEAVF- 

| AVCILVWLLAFCV-SLPDTYYL 

-KFISAIWLASALL-AI PMLFT- 
I WAKLY SLVIWSCTLLL-S S PMLVF R 
I -AKKI I IFVWAFTSIYCMLWFFLLD 
-AV I AG IWLVALAL-AFPQCFY - 

WIFVIWVLALLL-ASPOGYY- 

IVIGSIWILAFLL-AFPQCLY- 

VAKVTCII IWLIACLA-SLPTIIHR 
| QSALVCALLWALSCLVTTME-YVM- 

-AI UMLGGWLFSSLI AMLPLVGVS 
-ACAIMVGGWVCCFLLALLPLVGIS 
-AASVMVMGWI FAFAAALFPI FGI S 

HMMGVATrWVMALA-CAAPPLAGW 
LAIVGIAFSWIWAAV-WTAPPIFGW 
LAI VG IAFSWI WSAV-WTAPPIFGW 
HALTVVLATWTIGIG— VSIPPFFGW 

I LCGFLVLVSWIVSVLHALFOSLM4L 
UVLLWGSWVVAWNCLLHILLMA 
LAMRLALGSWLCGFSAITVPATLIA 
LCILLLLLSWVISIFHAFIQSLIVL 
IXTTCLVLLLKMLTTSHAMMHTLLAA 
LCVQMAAGSWAGGFGI SMVKVFLI S 
LCTCLIXVFWIKrSSHAMMHTLLAA 
LCVSLVVLSWVLTTFHAMLHTLLMA 
FCASLVLLLWMLTMTHALLHTLLIA 
LCVSLVVLSWVLTTFHAMLHTLI/1A 



-AWAFCUWTI A I VI AVLPLL-CW 
— VI Y I AVIWVMATFF-S LPHAIC- 
-AY IG ITVI WVLA VAS-S LPFVI Y - 
-LFLLISACWVISLILGGLPIM~GW 
-IWLFLGLCWLVSFLVGLTPMF-GW 
VAKWNLGVWVLSLLV-ILPIWFS 
LSAGVCALLWLLSFLV-TSIHNYF- 
STYMI LLLTWLAG LI -FS VPAAVY T 
ATCWVVIF-WILAVL-MGMPHYLMY 
[OACLFSI FWWI FAVI- IAI PHFMS/V 



KNTVQFVGN 

-NMCSOPKEGRNYSO 



NRLGEAQRAWAANGSGGEPVT 



-QYLVGERTMLAG 
-QF I VGVRTVEDG 
-QYFVGKRTVPPG 
-CFWCKRTVPDN 
-QY LVG KRTVPLD 



— WRAESDEAR 
— YRATHQEAI 
-WRVGADAEAO 
— ROPAPEDET 
— KEPAPNDDK 
— RQPDGAAYP 



DQGPQPRGRP 

-EKKGGGGGPQPAEP 
-EKKGAGGGQQPAEP 
-WDWPDEFTSAT 



-HKAKPTSPSDGNATSLAETID 
-NRDQAASWGGLDLPNN- (20) 

NNADQN 

^fl-x\»uPT 



NDVRGRDPA 

RQAKAQEEMS 

RTPEDRSDPD 

LRDESKVFVNNT 

LQDDSKVFKEG 

-NSRNETSKGNHTTS 



— RVTTVPGKTGTV 
— RWREEYFPP KV 
-KEQTI QVPGXN IT 
CRYTVQYPGS 



FRRTVYSSNVSP 

-DSTNWSNKACSGNIT 
WMVPFEYKGAQHR 



DTTSDYKGKPLR 

-SDLHPFHVKDTNQTFI 
-SEVARI -GSSDNSSFT 
KTVTSASNNET 



-MGLQNRSGDCtHPGGL 

TMKDYREEGHNV 

LNISTYKNAVW 

■ STVTMDCGAT 

STTETMPS RV 

SKIKVMPGRT 

NFFIENTNIT 

CIDREEESHSRN 



-- NYMKVS 
— SYAKVS 
— SYMKVS 



-SRYIPEGLCC 
-SRYWPHGLKT 
-SRYWPHGLKT 
-SRFIPECLQC 



-ALPFCTHLEIPHY 
-RKSFCADNMIPHF 
-RLSFCGSRVINHF 
-QLTFCGDVKI PHF 
-RLSFCENNWLNF 
-RLSYCGPNTINHF 
-RLSFCENNVLLNF 
-RLSFCEDSVIPHY 
-RLSFCEKNVILHF 
-RLSFCADNMIPHF 



NCKKLQS 

QKLFTFKYSEDIVRS 

OILTDEPFQNVSLAAFKDKY 

NCISALS 



NRKVTLELSQNSSTL 

RTAANSDGTV 

— CMFLGH EA5GT 

-TVVMHHDAMrTNNTNGHA 

SHTNN 

— TKKDN 



-PROTEIN COUPLED RECEPTORS 




4 
5 
6 
7 
8 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 

29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 

48 
49 
50 



WCWIGVSFTGYRFG 

GCGEGQVACLFEDVVPMN 

KCEFEKVISME 



QCYIQFLSQP- 
ECYIOFFSNP- 
ECFIQFLSEP- 
QCFIQFLSNP- 
ECQIQFLSEP- 



65 
66 
67 
68 
69 
70 
71 
72 
73 
74 



RCYNDPKCCDFVXNS 
NCYANETCCDF FTNQ- 
RCHSNPRCCAFASNM- 

ICQINEEP 

ECVTEEP 

QCGLNDET 

QCKLNQEA 

RCEINDQK 

SCKINDQK 

PCELTSQRI 

NCDSSLSR 

-CDSSLNR 

ECIIANP 

VCSISNP 

VCRLEDR 

DCLVNTSQ 

ACTISKDH 

TCVLNDPN- 
SCLLADDN- 
KCKVQVNE- 



ACTFNFSPWTNDPKER 

LCGCDY SHDKRRER 

TCHDVLNETLLEGYYA 

WCFLTLGAESGDVAFG 

ACYEDMGNHYANWRM 

RCFEHYEKGSKPV- 

TCMLNATSK—FMEFYQDV-KD 
VCMLNP FQKTAFMQFYKTAAKD 

SCAPYPHSNELHPK 

ACI PY PQTDELHPK 

YCRSFY PEHSI KEWLI 

VCTPIVDTATVK 

TCVIVYPSRSWEV 

SCGYKISRNYYS 



KCWAWPEDSGGKTLL 

VCMIEWPEKPNRTYEK 

LCYV-WPEGPKQHF 

VCAFHYESQNSTLPV 

DC RAVI 



ICLPMOVETTLSQ— 
ICLPMDTETP1AL — 
ICLPMDIDSPLSQ-— 



51 SCG IDYYTLKP EVNNE 

52 SCGPDVFSGSSYPGVQ 

53 SCGPDVFSGSSYPGVQ 

54 SCGPDWYTVGTKYRSE 

55 FCEPNQV1 QLTCSDAFLND— 

56 FCDGTPLLKLSCSDTHLKE — 

57 FCOISPWIVLSCTOTQWE— 

58 FCELNQLSQLTCSDNFPSH — 

59 FCDLFVLLKLACSDTY I NE — 

60 FCDVSPIiNLSCTDMSTAE— 

61 FCDLFVLLKLACSDTYVNE — 

62 FCDMSTLLKVACSDTHDNE — 

63 FCD X SALLKLSCSDI YVNE — 

64 FCDISPLLKLSCSDTHVNE-- 



VCCDIFPLIDGTYLM- 
LCLDPFPEPADLFWK- 

VCFDKFPSDSHRL 

SCSTVLPLYHKH 

SCHFRSWGLD 



ACNMLMPEPAQRWLV- 
ACLNM) 



TCVLYFVAEEVHTVLL 

ECVGWF ANETSGWF PV 

OCMTDYDYLEVSYP I 



-LFYPFLFIWAISAVLVGLT 

YMVYYNFFAFVLVPI4XMLGVYL- 
YIWFNFFVWVLPPLLLKVLIYL- 

I ITFGTAMAAFWPVTVMCTLYW- 
AVTFGTAIAAFYLPVIIKIVLYW- 
TITFCTAIAAFYMPVTIMTILYW- 
AVTFGTA I AAFYLPWI MIV LY I - . 
TITFGTAIAAFYIPVSIMTILYC- 

AYAIASSWSFYVPLCIMAFVYL- 
AYA-ASSAVSFYVPLVIMVFVYS- 
PYVLLSS SVS FY LP LLVMLFVY A~ 
GYVLFSAtGSFYVPLTI 1LVKYC- 
FCALFCSLGSFYIPIAVILVMYC- 
WYILSSCIGSFFAPCLIM5LVYA- 
WYILASSIGSFFAPCLIMILVYX- 
WYVI SSCIGSFFAPCLIMILVYV- 
WYV2SSSIGSFFAPCLIMILVYV- 
GYVIYSSLGSFFIPLAIMnVYI- 
TYAISSSVISFYIPVAIMIVTYT- 
TYAI SSSLIS FYI PVAIKIVTYT- 
AFWYSSIVSFYVPFIVTLLVYI- 
DFVI YSSWS FYLP FGVTVLVYA- 
DYWYSSVCS FFLPCPLKLLLYW- 
SYTZYSTCGAFYIPSVUJILYG- 
GYTI YSTIFAFYI PLLLMLVLYG— 
FVLIGSFVA-FFI PTUMVITYF- 
FVLIGSFVA-FFI PLTIMVITYF- 
VYGLVDGLVTFYLPLLIMCITYY— 

INVAVAMLTVRG1IRFIIGFSAPM 
AVAIVRLVLGFLWPLLTtTICYT- 
YYFSAFSAVFFFVPLIIStVCYVS 
LLFSMLGGLSVGLSFLLNTVSVA- 
LLRI IPQSFGFIVPLLIKLYCYGF 
LI IHICI VLGFFI VFLLILFCNL- 

WWLFGFY FCMPLVCtAI FYTL 

WWLFAFYFCLPLAITAIFYTL 

IHSMASFLVFYVIPLAIISVYYYF 
IHSVLIFLVYFUPLVIISIYYYH 
SMELVSWLGFAVPFSIIAVFYFS 
WIQVNTFMSFLFPMLVISILNT- 
F1NMLLNLVGFLLPLSI ITFCTVR 
P I YLMDFGVFYWPMI LATVLYGF 
LYHLWI ALI YFLPLAVMFVAYS - 
A YHI CVTVLI YFLPLLVI GY AYT- 
TYHIIVIILVYCFPLLIMGVTYT- 
GLGLTKNILGFLFPFLIILTSYT- 
IFIAILSFLVFT-PLMLVSSTIL- 

-VY I LTI LI LNWAFLI ICACYI - 
-AYIVFVLTLNIVAFVIVCCCYV- 
-LYVMSLLVLNVLAFVVICGCYT- 

SFVI YMFWHFTI PMI I IFFCYG- 
SY^VLMVTCCITPLSI IVLCYL- 
SYMIVLMVTCCIIPIAI IMLCYL- 
SYTWFLFI FCFIVPLSLICFSYT- 

LVIYFTLVLLATVPLAGIFYSYF- 
LMI LTEGAWMVTPFVC ILISYI- 
LVSFGIAFCVILGSCGITLVSYA- 
LIMNLVPVMLAAISFSGILYSYF- 
LMI F IMS TLX 1 1 1 PFFLIVMS YA- 
LTOFVLAIFILLGPLSVTGASYM- 
LMIHIMGVIIIVIPFVLIVISYA- 
LAIFILGGPIWLPFLLIIVSYA- 
LMIYILGGLI III PFLLIVMSYV- 
LVIFVMGGLVIVIPFVLIIVSYA.- 

FWIGVTSVLLLFIVYAYMYILW-- 
YLDLATFILLYLLPLFI ISVAYA- 
SYTTLLLVLQYFGPLCFIFICYF- 
— Yl LFCTIVFTLLLLS IVI LYC- 
-YMFFSFIWILIPLWMCI IYLD 
GFVLYTFLMGFLLPVGAICLCYV- 
ISLGILLFFLFC-PLMVLPCLAL- 
SWKVLLTMVWGAAPVIMWIWFYA- 
FLNTKVNICGYLAPIALMAYTYN- 
I LNVELMLGAFVI PL5VI5YCYY- 



SRYTYWIHNGVSDN - 

RIFLAARRQLKQMESQPLPGERARSTLO- 
EVFYLI RRQLGKKVSASSGOPQKYYG 



RIYRETENRARELAALQGSETPGKCGGSSSSSERSQPGAEGSPETP 
HISRASKSRI KKDKKEPVANQDPVS PSLVQGRIVKPNNNNMPSSDD 
RIYKETEKRTKEIAGLOASGTEAETENFVHPTGSSRSCSSYELQQQ 
HI SIASRS RVHKHRPEGPKEKKAKTIAFLKSP LMKQSVKKP PPGEA 
RIYRETEKRXKDLADLQGSDS VY KAEKRKP AH RALFRSCLRCPRPT 

RVFREAQKC7VKKIDSCERRFLGGPARPPSPSPSPVPAPAPPGPPRP 
RVFQEAKRQI^KIDKSEGRFHVQNLSQVEQDGRTGHGLRRSSKFCL 
RVFWATRQLRLUIGELGRFPPEESPPAPSRS1APAPVGTGAPPEG 
RVYWAKRESRGLKSGLKTDKSD SEQVTLRIHRKNAQVGGSGVTS A 
RVY IVAKRTrKNI*EACVHKEMSNSKELT7JUHWS KNFHEDTLS STK 
RIYRVAKRRTRTLSEKRAPVGPDGASPTTENGLGAAAGEARTGTAR 
RIYIJAKRSNRRGPRAKGGPGQGESKQPRPDHGGAIASAKLPALAS 
RI YQI AKRRTRVPP SRRDPDAVAAPPGGTERRPNGLGPERS AG PGG 
RI YQI AKRRTRVPPSRRGPDACS AP PGGAD RRPNAVGP ERGAGTAG 
EIFVATRRRLRERARANKI>nriAIJ<STEl£PMANSSPVAASNSGSK 
RIYRIAQKQI RRLAALERAAVHAKNCQTTTGNKPVECSQPESSFKM 

RIYRIAOVQIRRISSLERAAEHAQSCRSSAACAPDTSLRASIK 

KIYIVUWRRKRVNTKRSSRAFRAHIiRAPLK^^ 
RIYWLKQRRRKRI LTRQNSQCNSVRP GFPQQSTSLPO PAHLELKR 
ATFRGLQRWEVARRAKLHGRAPRRPSGPGPPSPTPPAPRLPQDPCC 
RIYRAARNRIIWPSLYGKRFTTAHLITCSAGSSLCSLNSSLKEGH 
RIFRAARFRI RKTVKKVE KTGADTRHGASP APQP KKSVNGESGSRN 
LTIWIJlRarLMliRGHTEEEXANMSUIFI^CCKKNGGEX 
LTIKSLQKEIATICVSOLSTRAKLASFSFLPQSSLSSEKI4FQRSIHR 
RIFKVARDQAKRNHISSWKAATI 



S I VAVS YGLI ATKI HKQG I 

FILLRTWSRRA 

I I RCLSS SAVANRS KKSR- 
TLHHVYHGQEAAQQRPR— 
TLRTLFKAHM- 



VI I HTLLRGPVKQQRNA 

MTCEMLNRRNG5 LRIALS EHL 

MTCEMLRKKSGM-QIALNDHL 



IARNLIQSAYNLPVEGNIHVKKQI 

IAKTLIRSAHNLPGEYNEHTKKQM ; 

LLARAISASSD 

VIAWKLTVMVHQAAEQGRVCTVGTHNGI£HSTF7^IEPGRV- 

I MQVLRNNEMKKFKEVQ : 

I ARI LFLNPI PSDPKENSKWKNDS I HQNKNLNLNA 

VIGLTLWRRAVPGHQAHGANLRHL " 

WG I TLWASE IPGDSSDRYHEGV ; 

IVGITLWGGEIPGOTCDKYHEQL 

LIWKTLKKAYEI QKNKP : 

WKIRKNTWAS ; " 



KIYFAVRNPEIHATN-- 
KIYITVRNPOYNPGD— 
HIYLTVRNPNIVSSS— 



QLVFTVKEAAAQQQ ESATTQ-- 
QVWIA I RAVAKQQKES ESTQ — 
QVWLAIRAVAKQQKESESTQ — 
QLLRALKAVAAQQQESATTQ — 



KIVSSIC- 
HITCAVL- 
YIITTII- 
KIVSSIH- 
RIISSIL- 
AITGAVM- 
KIISSIL- 
RIVSSIF- 
RIFFSIL- 
RWASIL- 



KAH SKAVRMI QRGTQKSI 1 1 HTS EDGKVQVTRPDQA- 

RVAKKLWLCNTI GDVTTEQY LALR 

KIYIRLKRRNNMhDKIRDSKYRSS 

RI Y SLVRTRS RRLTFRKNI S KAS RS 

I FY 1 1 RNKLS.QNLTGFRETRAFYG 

LIIAKMRMVALKAGWQQRKR 

LHVECRARRRQ 

FFYSTVORTSQ 

RMVRFIINYVG ' : r " 

RISRIVAVSQS 
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-KEKHLTYQFK 

KEVHAAKS 

KELKXAX5 



{ 83) -KGQKPRGKEQLAKRKTFS LVKEKKAART 
(1 10) -K-IVKMItC-QPAKKJCP-PPSREKKVTRT 
(1 66) -KRTALKTRSQITKRKRMSLVKEKKAAQT 
(113) -K-FASIARNQVRKKRQM-AARERKVTRT 
(1 55) — KGLNPNPS HQMTKRKRMS LVKERKAAQT 

-AAAAATAPLANGRAGKRRPS RLVALREQ KALKT 
KEH KALKT 



VPACGRRPARLI PI REHRALCT 

KNKXHFSVRLLKFS REKKAAKT 

AKGHNPRSS I AVKLFKFS REKKAAKT 

- (77) -FLSRBRRARSSVCRRKVAQAREKRFTFV 
- (106) -CRGVGAI GGQWWRKRAHVTREKRFTFV 
- ( 84 ) -GRGRSASGLPRRRAGAGGONREKRFTFV 

- (84 > -GCGEERAGGAKASRWRCRONREKRFTFV 
-(167) - KKTSGVNQFI EE KQK IS LS KERRAART 

' 5FKRETKVLKT 

: XETKVLKT 

- ( 91 ) -PNGKTRTSLKIMSRBKLSOQKEKKATQM 
- (47) -SNGRLSTSLKLPLQPRCVPLREKKATQM 
-(29) -AIJ?POTPPtfrRRRRRAXITCRERKAMRV 
-(10) -NHVKI KLADSALERKRI SAARERKATKI 

- (57) -ASFERKNE RNA EAKRKMALARERKTVK? 
-NPNPDOKPRRKKKEKRPRGTMOAINNEKKASKV 

: EPGSYAGRKTMQS ISNEQKACKV 

REHKATVT 



— IKSSRPLRV 
— TRSTKTLKV 
--TNRCFNSTV 
--DSEVEMMAQ 
— GQKHRAMRV 
--EVRRRALWM 
— KQRREVAXT 
— KORREVAKT 
— ESRXRLAKT 
— ETRKRXAKI 
— QEKH5SRKI 
— QALRHGVLV 

TEKKATVL 

—SSRKCVTKM 
— QAXKKFVKT 
— SAKRKWKM 
— KAKRKWKM 
— RKDDIFKI 
— HSSKLYIV 



-KDTKIAKKM 
-KDTKIAKRM 
-SDTRIAKRM 

-KAEKEVTRM 
-KAEKEVTRM 
-KAEKEVTRM 
-KAEREVSRM 



-AISSVHGKYK 
-RVSSPRGGWX 
-KIPSARGRHR 
-SISTVQGKYK 
-KVPSTQGICK 
-RIPSAAGRHK 
-KVPSTQSIHK 
-KVPSSOSIHK 
-KFPSIODIYK 
-KVPSVRGIHK 



-RMDIRLAKT 
-RKKKTTVKM 
-ETKRINV-M 
-SENVALLKT 
—REFKTAKS 
— SERKITLM 
— RSAKLNHV 
-KQRSRTLTF 
-KWHMQTLHV 
-RHKGRJVRV 



LINYIIVFLVCWVFAWNRIVNGL 
LAI IVGLFALCWLPLHI INCFTFF 
LALILFLFALSWLPLHI LNCITLF 

LSAILLAFILTWTPYKIWLVSTF 
IIAIUAFIITWAPrNVMVLINTF 
LSAILLAFIITWTPYNIMVLVNTF 
IFA I IXAFILTWTPYNVMVLVNTF 
LSAILXAFIITWTPYNIM7LVSTF 

LCI IMGVFTLOTLPFFLANVVKAF 
LGIIMiTtTLCWLPFFIVNIVHVI 
LGLIMGTFTIXWLPFFLANVLRAL 
LGIWGCFVLCWLPFFLVMPIGSF 
LGIWGKFILCWLPFTIALPLG5L 
IAVV^CVFVLCWFP^F^IYSLYGI 
LAWIGVFVLCWFPFFFSYSLGAI 
LAWIGVFWCWFPFFFTYTLTAV 
IAVVIGVrVVCWFPFFFTYTLIAV 
LGI IMGVFVICWLPFFLMYVTLPF 
LSVIMGVFVCCWLPFFILNCILPF 
LSVIMGVFVCCWtPFTILNCMVPF 
LAIVLGVFIICWLPFFITHiLNIH 
VAIVLGAFIVCWLPFFLTHVLNTH 
LPVWGAFI LCWTPFFWH I TQAL 
LGIILGAFIICWLPFFWSLVLPI 
LGI IMGTFILCWLPFFIVALVLPF 
LGIVFFVFLIMtfCPFFITNILSVL 
LGI VFFLFWhWC PFF I TN IMA VI 
IAAVMGAFIICWFPYFXAFVYRGL 

LSFVAAAFFLCWSPYQWALIATV 
WAWASFFI FWLPYQVTG IhWSF 
ALFLSAAVFCIFI ICFGPTNVLLI 
LJjGlWVASVCWLPLLVFIAQTVL 
IFAWLIFLLCWLPYNLVLLADTL 
VCTVLAVFVI CFVPHHMVQLPWTL 
VFCLWIFALCWFPLHLSRI LKKT 
VFCLVLVFALCWLPLHLSRI LKLT 
VLVFVG LFAFCWL PNHVI Y LY RSY 
VLVFVGCFVFCWFPNHI LYLYRSF 
I FS YVWrLVCWLPYHVAVLLD IF 
LRAWIAFVVCWLPYHVRRLMFCY 
VIAVLGLFVLCWFPFOI STFLDTL 
LAWVI LFALLWMPYRTLWVNSF 
WLVWTFAI CWLPYH LYF I LG SF 
MIWVCTFAICWLPFHVFFLLPYI 
MIIVWTFAICWLPYHVYFILTAI 
ILAIVLFFFFSWVPHNIFTFT©VL 
IMVTI I IFLIFAMPMRLLYLLYYE 

AILIFTDFT-CMAPISFFAISAAF 
AVLI FTDFI-CMAPI SFYALSA I L 
AMLIFTOFL-CMAPISFFAISASL 

VIIMVIAFLICWVPYASVAFYIFT 
WVMVLAFCFCWGPYAFFACFAAA 
VWMI FA YCVCWG P YTFFACFAAA 
VWMVG SFCVCYVPYAAFAMYMVN 

AFSTCASHLSWSLFYCTGLGVYL 
SFSTCGSHLAWCLFYCTVIAVYF 
AFSTCSSHLTWLIWYGSTIFLHV 
AFSTCASHLS IVS LFYSTG LGVYV 
VFSTCGSHLSWSLFYGTI IGLYL 
AFSTCASHLTWI I FYAAS I FI YA 
VFSTCGSHLSWSLFYGTI IGLYL 
AFSTCGSHLS WS LFYGTV IGLYL 
VFSTCGSHLSWTLFYGTI FGI YL 
I FSTCGSHLSWS LFYCTI IGLYL 

LVLILWLIICWGPLLAIMVYDVF 
LVLWV LFALCWF PLNCYV LLLS- 
LLSIWAFAVCWLPLTI FNTVFDW 
VIIVLSVFIACWAPLFILLLLDVG 
LFLVLF LFALCWLPLS 1 I NFVS YF 
VmWMVFVI CWMPFYWOLVNVF 
VLAI VSVFLV-SS IYLCIDWFLFW 
VSVLLISFVALQTPYVSLMIFNSY 
LLVWVSFASFWFPFNLALFLESI 
LIAWLVFIIFWLPYHLTLFVDTI 



NMFPPALNILHTYL — 
— — -CPECSHAPLW- 
CPSCRKPSI— 



-CKDCVPET- 
-CAPCIPNT- 
-CDSCIPKT- 
-CQSCIPDT- 
-CDKCVPVT- 



— HRELVPDR— 
— QDNLIRKE— 
-GGPSLVPGP— 
— FPDFRPSET- 
— FSTLKPPDA- 



— CREACQVPGP- 
— C PKHCKVP HG- 



-CCSVPRT- 



—GCPVPYQ— 
— CQTCCPTNK- 



— CGSGETQPFCIDSN 

—CSGHPEGPPAGFPCVSET 

— CDCNIPPV 

-- CQTCHVSPE : 

—CPACSVPPR 



—CRDSCWIHPA 

— CESSCHMPTL 

—CGKACNQLMEK 

— C KESCNENVIGA — 
— RGDDAINEV 



-RIRELLQGMYKEIGI 

LEPSSPTFLLLNK 

AHYSFLSHTSTTEAAYF 

-RNPPAMSPAGQLSRTTEKE 

MRTQVI QETCERRNHI DR 

-AELGMWPSSNHQAIND-— — 
VYDEM5TNRCELLSFLI 
LYDQSNPQRCELLSFLLV- 
HYSEVOTSMLHFVt 



-NYKEIDPSLGHMI- 



— SI LHYIPFTCRLEHALFT- 

ISDEQWTTFLFDFYHY- 

-LRLGVLSGCWNERAVDI 

LSSPFQENWK 

QEDIYCHKFIQQ 

NPDLYLKKF I QQ 

YQQLNRWKYIQQ 

-IOLGLIRDCKIEDIVDT 

YWSTFGN 



-KVPLITVTNSK 

-NKPLITVSNSK 

-KVPLITVSKAK 



-HQGSNFGPI — 
-NPGYPFHPL — 
-KPGYAFHPL — 
-NRNHGLDLR— 



— SSAANNSSOASA- 
—NPSSSHLAGRDM- 
— RTSVESSLDLTK- 
— SSAWQSSHSAA- 
— C P AGNNSTVKEM- 
— RP KALS AFTDNK- 
— CPSGDNFSLKGS- 
— CPSANNSEVKET- 
r- CPSGNNSTVKEI- 
— CPSANNSTVKET- 



-GKMNKLI KT 

-SKAIHTNNA 

-NHQI I ATCNHNL- 
-C KVKTCD I LFR — 
— NVKI PET 



-AEQODAT 



VFQIPAPF 

ATTAWPMQCEHLTLRRT- 



RLLAGVYNDTLQNVI I F * 

KLLKWISSSCEFERSLKR 
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31 
32 
33 
34 
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40 
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47 
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50 

51 
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GAKILWrTFFGYFTDVQKKIXKNKNNNNPS^ 

I REFRQTFRKI I RSHVLRRREPFKAGGTSARALAAHCSDGEQ2 SLRLNGHPPGVWANGS AP HPERRPNGYT- (50 ) 
IQKFRVTFLKIWNDHFRCQPTPPVDEDPPEEAPHD 

NKAFRDTFKIiU/TRTOKRIWRKIPKRPCSVHRTPSRQC 
NATFKKTFKHLLMCHYKNIGATR 

NKTFRTTFKTLLLCQCDKRKRRKQQYQQRQSVIFHKRVPEQAL 
NATF KKTF RHLLLCQRYNI GTAR 
NRTFRKTFKMLLLCRWKKKKVEEKLYWQGNSK1P 

PDFRKAFQGLLCCAPJWVRRRHATHCDRPRASCC IARPGP PS PGAASDDDDDDWGATP PARLLEPWAGCN- (25) 
PDFPJAFQEIJ£LRRSSU(AYGNGYSSNGNTGEQSGYHVEQEK^ (13) 
P DF RS AF RRLLCRCGRRLPPEPCAAARPALFP SGVP AAES SPAQPRLCQRLDG 

SQEFKKAFQNWUQCIJWQSSKHTLCYTUIAP^ (^6) 

SKEFKRAFMRIIZXQCPCGRRRRRRRRLGACAYTYRPWIKGGSIXR (93) 

NQDFRPSFKHILFRRRRRGFRQ 

NQDFRPAFRRILCRPWTQTAW 

NHDFRRAFKKI LCRGDRKRI V 

NHDFRRAFKKI LCRGDRKRIV 

KLDYRRAFKRLLGLN 

NADFRKAFSTLU^YRI£PATNNAIETVSINNNGAAMFSSH^ 

NADFQKVFAQLI^SHFCSRTPVETVNISNELISYNQDIV^ «S) 

N I EF RKAFLKI LHC 

NIEFRKAFLKILSC 

NAEFRNVFRKALRACC 

NEEFRQAFQKIVPFRKAS 

NKDFQNAFKKI I KCNFCRQ 

NKI YRRAFSKYLRCDYKPDKKPPVRQIPRVAATALSGRELNVNI YRHTNERVARKANDPEPGI ENQVENLE- (16) 
NKTYRSAFS RYLQCQYKENRKP LQLI LVNTI P AIAY KS SQLQVGQKKNSQEOAEQTVDDCSWTIXJKQQSE- (17) 
NR0FRTGYQQLFCCR1ANRNSHKTSIJISNASQI3RTQSREP RQQEEKPUCl^ 

GQDFRERL1HALPAS1£RALTEDSTX>TSDTATNSTLPSAEVALQAK 

GQGQWRli^LPSlXlWLTEESWRESKSFTRSTSAnMAQKTQAV 

SSECQRYVYSII^CKESSDPSSYNSSGQI^SKMTTCSSNLNNSIYKIOLT 

RRAVLRRLQPPXSTRPRSLSLQPQLTQRSGLQ 

GQKFRH GLLKILAI HGLI S KD SLP KD5RP5 FVGSSSGHTSTTL 

TKKFRKHI^EKLNIMRSSQKCSRVTTCTGTEMAI PINHTPVNPIKN 

KKFKNCFQSCLCCCCYQSKSIlffSVPh^^SIQWOWEQNNHWrERSSHKDSIN 

KRFKNCFKSCLCCWCOTFEEKQSLEEKQSCLKFKANDHGYONFRSSNKYSSS 

KSFTOFNTQIXCCQPGIW^HSTGRSTTCMrSFKSTtffSATFSIJNRinCHEGYV 

ESFRKHFSNQLCCGQKSYPERSTSYLI^SSAVRMTSIJCSNAKNSArTOSVLLNGHSTKQEIAL 

NRNYRYEIWKAFIFKYSAKTGLTKUDASRVSETEYSALEQNAK 

SANFRQ^ISTIJVCLCPCWRHRRKKRPTFSRKPNSMSSNHAFSTSATRETLY 

G KRFRKK5REVYQAX CRKGGCMGE SVCMENSMGTLRTS I SVDRQI H KLQDWAGNKQ 

SQKPX AAFPJUjC^ KQKPTEKAAWSVAUWSN^ KESDRFSTEIXDI TVTDTWSXTKVS FDDTCLAS EN 

NHRFRSGFRIAFRCCPWVTPTKEDKLELTT>TTSI>ST (17) 

NDRFRLGFKHAFRCCPFISAGDYEGLEMKSTRYLOT^SSVYKV^ (29) 

NKRFRAGFKRftFWCPFIGVSSYDELELKTTRFHPTRQSSLYT^ (34) 

GKKFKKYFIjQIXKYIPPKAKSHSNLSTKMSTLSYRPSEQGNSSTKKPAPCIEVE 

CSSKKKRFKESLKWLTRAFKDEMQPRRQKDNC-OTVTVETW 

TKTFQRDFFUXSKFGCClOUWIYPJUttFSAYTSNC^ 

TKA FQ RD VF I LLSKFG I C KRQAQAYRGQRVPP KNSTDI QVQKVrTHDMRQGAllWEDVVELI ENSH LTPKKQ- (12) 
TKNFRRD FTI LLSKCGCY EMQAQI Y RTETSSTVHNTH PRNGHCSSAP RVTSGSSTY I LVPLSHLAQN 

NKQFRNCMLTTICCGKNPLGODEASATVSKTETSQVAPA 
N RQFRNC I ZfQLFGK KVDDG S E LS SAS KTEVS SVSSVSPA 
NRQFRNCILOLFGKKVDDGSELSSASKTEVSSVSSVSPA 
NKQFOACIMKMVCGKAMTOESDTCSSQKTEVSTVSSTQVGPN 

RNKDVKSVLKKTLCEEVIRSPPSLLHFFLVLCHLPCFIFCY 

RNSDMKAALRKVLAMRFPSKQ 

RNKDVKEALRRTVKGK 

RNKDVKRALERLLEGNC KVH HWTG 

RNRDMKRALI RVICSMKITL 

RNQ0VKRALRRTLHLAQDOEANTNKGSKIG 

RNRDMKQALI RVTCSKKI SLPW 

RNRDIKDALEKIMCKKQIPSFL 

RNRDMKRALI RVI CTKKI S L 

RNRDMKEALI RVLCKKKITFCL 

SKDIJlHAFRSMFPSCEGTAQPji>NSbi;DST)CUlKHANNTASMHRA^ 

NE^FRVEIJ<ALLS^QRPPKPEDRLPSPVPSFRVAWTEKSHGRRAPLPNHHI^SSQIQSGKTDI^SVEPWA^ 
NKNFQRDLQFFFNFCDFRSRDGRTTRL 

NKEMRRAFIRJ^^CCKCPSGDSAGKFKRPIIAGMEFSP3KSDNSSHPQKDEGDNPETIMSSG^^SSS 
KKFKETYFVI LRAC RLCQT5D SLDSNLEQTTE 

SDNFKRSFQRI LC LSWMDNAA EE PVDY YATALKS RAYS VEDFQP EN LESGGVFRNGTC AS RI STL 

GRDKSORLV^PLRVVFQRALRIJGAEPGDAASSTPWVIMEMQCPSGNAS 

GHDFIXJRMRQCFRGQLLDRAFLRSQQNQRA 

GTQMRKI»fnTLRVFACCCVKQEIPYODIDI 

GTKFRKNYTVCWPSFASDSFPAMYPGTTA 
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5£ c ■ ? i ^-adrenergic receptor. Amino acid residues in black are conserved in nearly all of the 

UFKs. suppled residues are conserved in cationic amine receptors. Boxed residues are conserved in all catecholamine re- 
ceptors. The asterisk denotes the serine conserved in the serotonin receptors. Residues in diamonds are residues believed 
to oe mvolved in G-protein coupling. Glycosylated asparagines (N-6 and N-15) within the amino terminus are indicated 
cys oj the carboxyl terminus is known to be palmitoylated. Protein kinase A phosphorylation sites are indicated by 
arrowheads. . 



LIGAND BINDING DOMAINS 

Our understanding of the structure of the binding site of 
the GPRs and of which residues actually interact with ago- 
nists and antagonists is rapidly evolving. The Hgand bind- 
ing sites of rhodopsin and of the adrenergic and mus- 
carinic receptors have been partly delineated through bio- 
chemical and molecular biological approaches (for review, 
see Applebury and Hargrave, 1986; O'Dowd et a!., 1989b; 
Strader et a/., 1989b; Venter et aL, 1989; Hulihe et aL t 
1990). For most of the GPRs, with the possible exception 
of the glycoprotein hormone receptors, the ligand binding 
pocket appears to be created by the membrane-spanning 
regions. 

As none of the GPRs have yet been crystallized, model- 
ing of the three-dimensional array of the helices is based 
on the structure of bacteriorhodopsin, which has recently 
been resolved at high resolution (Henderson et a!., 1990). 
The transmembrane domains appear to form a hydrophilic 
pocket for ligand binding surrounded by hydrophobic resi- 
dues (Strader et a/., 1989b; Venter et aL, 1989; Hulme et 



aL, 1990). The putative arrangement of the residues 
around the ligand binding site have been analyzed through 
helical wheel modeling. The a-helices contains 3.6 residues 
per helical turn. When the assortment of residues around 
the helix is predicted for the muscarinic receptors (Hulme 
et aL, 1990), adrenergic receptors (Strader et a!:, 1989b; 
Venter et aL, 1989), and many other GPRs (Donnelly et 
aL, 1989), the domains contain a predominance of hydro- 
phobic residues on one side and hydrophilic on the other. 
The hydrophilic side of each helix is postulated to face in- 
wards and form the polar ligand binding site. Recently, 
computer-generated models for ligand-receptor interac- 
tions have been developed (Findiay and Eliopoulos, 1990; 
Henderson et aL, 1990; Dahl et a!., 1990; Hibert et al 
1991). 

The presence of a ligand binding pocket for the chromo- 
phore retinal deep within the transmembrane a-helices of 
rhodopsin was suggested by cross-linking and fluorescent 
energy transfer studies (Hargrave et aL, 1982; Thomas arid 
Stryer, 1982). Retinal forms a Shiff base linkage with 
Lys" 6 in TM 7 (Thomas and Stryer, 1982). The Glu l,J of 
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rhodopsin in TM 3 has been proposed as a counterion that 
interacts with the protonated Sniff base retinal, although 
mutagenesis studies have been inconclusive (Sakmar et at. , 
1989; Zhukovsky and Oprian, 1989). 

A variety of approaches support the existence of a simi- 
lar intrahelical binding site in the cationic amine receptors. 
The ligand binding site, of the adrenergic receptors has 
been investigated by photoaffinity labeling (Bar-Sinai et 
aL f ' 1986; Dohlman et aL, 1988), fluorescence emission 
spectroscopy (Tota and Strader, 1990), deletion mutants 
(Dixon et aL, 1987a,b), site-directed mutagenesis (Chung 
et aL, 1988; Dixon etaL, 1988; Strader et aL , 1988; Fraser, 
1989; Strader et aL, 1989a,b; Wang et aL, 1991), and re- 
ceptor chimeras (Kobilka et aL, 1988). As was the case for 
the visual pigments, the transmembrane domains are nec- 
essary for ligand binding and confer ligand specificity, 
while the hydrophilic extracellular and intracellular do- 
mains are not directly involved in ligand binding (Dixon et 
aL, 198Ia,b). In both the a- and 0-adrenergic receptors 
(Strader et aL, 1988; Wang et aL, 1991) as well as the m, 
muscarinic receptor (Fraser et aL, 1989), site-directed 
mutagenesis has demonstrated that the TM 3 aspartate 
(Asp UJ in the 0-adrenergic receptor; see Fig. 3) is critical 
for wild-type agonist and antagonist binding. The cationic 
amines, which include epinephrine, norepinephrine, dopa- 
mine, serotonin, and acetylcholine, all contain a positively 
charged amine head group which most likely interacts with 
the conserved TM 3 aspartate found in these receptors. 

Mutagenesis studies have suggested that particular resi- 
dues conserved within receptor subclasses can contribute 
to agonist specificity. Two conserved serines in TM 5 
(Ser 204 and Ser 107 in the ^-adrenergic receptor) have been 
implicated in forming, hydrogen bonds with the meta- and 
para-hydroxyl groups of adrenergic agonists. Replacement 
of either serine by alanine reduces agonist binding to the 
same degree as removing the corresponding hydroxyl 
group from the ligand (Strader et aL, 1989a). Recent muta- 
genesis studies of the on-adrenergic receptor suggest that 
Ser 104 (corresponding in position to Ser 107 of the 0-adrener- 
gic receptor) binds in an analogous fashion to thtpara-hy- 
droxyl group of adrenergic ligands (Wang et aL, 1991). 
Two corresponding serine residues are found in TM 5 in ail 
the dopamine receptors and a single conserved serine resi- 
due in TM 5 of the serotonin receptors (Fig, 2). The simi- 
larity of ligand structure and receptor sequence suggests 
that these TM 5 serines may also hydrogen bond with the 
aromatic hydroxyl groups of their respective agonists. This 
hypothesis is supported by computer modeling of the 
ligand-receptor interaction (Hibert et aL, 1991). The mus- 
carinic receptors all contain a conserved asparagine in TM 
6, not found in any other receptor subclass, which has 
been proposed to interact with the ester group of acetyl- 
choline (Hibert et aL, 1991). Conserved TM 6 and/or TM 
7 aromatic residues (e.g., Phe"° and Tyr" 6 in the /3-adre- 
nergic receptor) may interact with the aryl ring of seroto- 
nergic and adrenergic ligands (Dixon et aL, 1988; Hibert et 
aL, 1991). 

The ligands for the glycoprotein hormone receptors, 
thyroid-stimulating hormone (TSH), follicle-stimulating 
hormone (FSH), and lutenizing hormone/chorionic 



gonadotropin (LH/CG), are much larger than the ligands 
for the other GPRs. Presumably because' of the large size 
of the ligands, this receptor subclass has evolved a distinct 
structure, containing an extremely long first cytoplasmic 
domain encompassing the high-affinity hormone binding 
site. This glycosylated extracellular domain is rich in cys- 
teine residues, that may form disulfide bridges and help 
maintain the three-dimensional structure of the proteins 
(Sprengel et aL, 1990). The large amino-terminal extracel- 
lular domain of these receptors contains multiple leucine- 
rich repeats that identify these GPRs as members of a sec- 
ond gene family, that of the leucine-rich glycoprotein fam- 
ily (Takahashi et al., 1985; Krusius and Ruoslahti, 1986). 
The extracellular location of a hormone binding site is sup- 
ported by chimera studies (Moyle et aL, 1991; Nagayama 
et aL, 1991b) and, in the case of the LH/CG receptor, by 
the secretion of a soluble hormone binding protein gener- 
ated by alternative splicing which encompasses only the 
amino terminus (Loosfelt et aL, 1989; Tsai^Morris et aL, 
1990). Short regions of the amino terminus of TSH and 
LH/CG are necessary for high-affinity hormone binding 
(Wadsworth et a?., 1990; Nagayama et aL, 1991a,b). In 
contrast* ^-adrenergic receptor ligand binding is not de- 
pendent on the amino-terminal extracellular domain 
(Dixon etaL, 1987b). 

Recently the binding and activation of an LH/CG recep- 
tor construct in which virtually the entire extracellular 
amino terminus has been deleted was investigated (Ji and 
Ji, 1991b). The finding that CG can bind to the seven 
transmembrane components of the receptor, albeit with 
lower affinity, in the absence of the extracellular amino 
terminus suggests that this receptor may contain both a 
high-affinity binding site extracellularly and a low-affinity 
site within the transmembrane domains. CG binding to 
this low-affinity receptor mutant was capable of stimu- 
lating cAMP production. Possibly the high-affinity extra- 
cellular binding site serves to capture the hormone and pre* 
sent it to the intramembranous binding pocket for signal 
transduction. 

INTRACELLULAR COUPLING 

The GPRs are coupled by heterotrimeric G-proteins to 
various intracellular enzymes, ion channels, and transpor- 
ters (Johnson and Dhanasekaran, 1989; Birnbaumer et aL, 
1990). The G-proteins, which associate with GPRs at the 
intracellular face of the plasma membrane, are composed 
of relatively invariant j3- and 7-subunits and a variable a- 
subunit (a s , o it o 0 ) for which the G-protein is named (G SI 
Gj, Go). By a process not yet understood, GPR agonist 
binding catalyzes the exchange of GTP for GDP on the 
a-subunit (G-protein '.'activation"), resulting in its dissocia- 
tion and stimulation of one (or more) 0f the various signal- 
transducing enzymes and channels. The different G-pro- 
tein a-subunits preferentially stimulate particular effec- 
tors. The specificity of signal transduction may be deter- 
mined, therefore, by the specificity of G-protein coupling. 
Some GPR residues or regions which are necessary for ef- 
ficient signal transduction can be postulated to interact 
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with conserved G-protein motifs. In addition, certain short 
. amino acid stretches of the receptors which are necessary 
for G-protein coupling also determine the specificity of the 
G-protein interactions. 

■ Three types of studies investigating the relationship be- 
tween receptor structure and G-protein affinity have been 
performed. Deletion and site-directed mutagenesis studies 
implicate receptor regions and amino acid residues that are 
necessary for efficient G-protein coupling. Synthetic pep- 
tide competition studies suggest which oligopeptide do- 
mains may directly interact with the G-proteins. Chimera 
experiments delineate the receptor regions that determine 
the specificity of G-protein coupling. Certain general prin- 
ciples arise from these multifaceted investigations. All .of 
the intracellular domains are implicated in efficient 
G-protein coupling of various receptors. Short stretches.of 
the membrane proximal regions of the third cytoplasmic 
loop and possibly the carboxyl terminus appear particu- 
larly critical in determining the specificity of G-protein 
coupling for many receptors. 

Single residue mutations in the cytoplasmic loops of the 
/^-adrenergic receptor reduce signal transduction (Dixon et 
aL, 1988; O'Dowd et aL, 1988). Site-directed mutagenesis 
of a conserved proline in the second intracellular loop to 
threonine, for example, caused no change in agonist bind- 
ing but a ~35°/o reduction in adenylate cyclase stimulation 
(O'Dowd et aL, 1988). Site-directed mutagenesis has iden- 
tified particular charged residues in the membrane proxi- 
mal regions of the second and third intracellular loops 
which contribute to efficient G-protein coupling. Mutation 
of the highly conserved aspartate adjacent to TM 3 in the 
second intracellular loop of the 0-adrenergic receptor 
(Asp 130 which is in the "DRY" sequence) gives rise to a re- 
ceptor with high-affinity Iigand binding but reduced or ab- 
sent G-protein coupling (Dixon et aL, 1988; Fraser et at., 
1988). Similar results have been obtained for the mus- 
carinic mj and the a^-adrenergic receptors (Fraser et al. t 
1988, 1989; Wang et aL, 1991). The corresponding gluta- 
mate of rhodopsin is similarly implicated as interacting 
with transducin (Franke et aL t 1990). Another . residue 
needed for transducin activation by rhodopsin is the lysine 
located in the distal third intracellular loop, Lys 2 « a . Muta- 
tion of this lysine to leucine results in a complete loss of 
signal transduction (Franke et at., 1988). Mutation or dele- 
tion of histamine at the corresponding position in the 0- 
adrenergic receptor reduced, although did not abolish, 
adenylate cyclase stimulation (O'Dowd et at., 1988). 

The TM 2 aspartate (Asp" of the ^-adrenergic receptor, 
see Fig. 3), which is conserved in virtually all GPRs, is nec- 
essary for wild-type agonist binding and G-protein activa- 
tion in many GPRs studied (Chung et aL, 1988; Strader et 
aL, 1988; Fraser et aL, 1989; Wang et aL, 1991; Ji and Ji, 
1991a). In the a 2 -adrenergic and dopamine D 2 receptors, 
this aspartate is essential for modulation of receptor cou- 
pling by Na* and H\ possibly due to aliosteric modulation 
of receptor conformation (Horstman et aL, 1990; Neve, 
1991; Neve et aL, 1991). Another transmembrane residue,' 
the TM 6 cysteine found in most GPRs, has been impli- 
cated in ^-adrenergic receptor signal transduction (Fraser, 
1989). 



Deletion studies of the /^-adrenergic receptor have indi- 
cated that the membrane proximal regions of the third 
cytoplasmic loop (residues 222-229 and 258-270, see Fig. 
3) are necessary for signal transduction (Strader et aL, 
1987a; O'Dowd et aL, 1988). In the a,-adrenergic receptor, 
deletion of seven amino acids of the third intracellular 
loop proximal to TM 7 caused a marked reduction in cou- 
pling to phospholipase C (Cotecchia et aL, 1990). 

Deletions of carboxy-terminal residues adjacent to TM 7 
produce mutant rhodopsin or /3 2 -adrenergic receptors with 
diminished ability to activate G-proteins. (O'Dowd et aL, 
1988; Franke et aL, 1990). Mutation of a palmitoylated 
carboxy-terminal Cys 341 to glycine markedly reduced ago- 
nist stimulation of adenylate cyclase of the ^-adrenergic 
receptor (O'Dowd et aL, 1989a). The palmitoylated cys- 
teine is predicted to anchor the carboxyl terminus to the 
membrane, producing a fourth cytoplasmic loop (see Fig. 
3). Membrane anchorage may optimally position carboxy- 
terminal residues for G-protein interaction (Ovchinnikov 
et aL, 1988; O'Dowd et aL, 1989a). Regions of the car- 
boxyl terminus and third cytoplasmic loop, adjacent to the 
transmembrane domains, may form clustered amphipathic 
a-helices (Strader et aL, 1987a; Higashijima et aL, 1988; 
Strader et aL, 1989b; Palm et aL, 1990). These helices, 
along with charged intracellular residues of the second and 
third intracellular loops (i.e., DRY), may cooperatively 
interact to efficiently bind and activate G-proteins. 

The activation of G-proteins by amphipathic a-helices 
is supported by experiments in which the G proteins Gj and 
G Q have been directly activated by mastoparan and other 
small peptides which form amphipathic a-helices at the 
inner surface of the cytoplasmic membrane. (Higashijima et 
aL, 1988, 1990). Furthermore, direct activation of G s has 
been demonstrated for synthetic peptides representing the 
third intracellular loop sequences adjacent to TM 5 and 
TM 6 of the 0 2 -adrenergic receptor (Cheung et aL, 1991), 
and by a peptide representing the intracellular third loop 
sequence proximal to TM 6 of the avian ^-adrenergic re- 
ceptor (Palm et aL, 1989; Munch et aL, 1991). 

Peptide competition experiments, in which short syn- 
thetic peptides competitively bind to G-proteins but do not 
activate them, have been useful in mapping GPR regions 
that are likely to contact the G-proteins. Receptor uncou- 
pling following mutagenesis or deletion of receptor seg- 
ments may be due either to loss of G-protein contacts or to 
altered tertiary structure of the receptors. Competition 
studies have been invaluable, therefore, in confirming that 
the loss of signal transduction observed in deletion and 
mutagenesis studies involves residues that directly bind the 
-G-proteins. The regions of various receptors implicated by 
peptide competition studies include the membrane proxi- 
mal regions of all three cytoplasmic loops and the carboxyl 
terminus of the avian ^-adrenergic receptor (Palm et aL , 
1989; Munch et aL, 1991), the second intracellular loop 
and the carboxyl terminus of the third intracellular loop of 
the a^-adrenergic receptor (Dalman and Neubig, 1991), 
and the second and third intracellular loops and amino-ter- 
minaJ region of the carboxyl terminus of rhodopsin (Konig 
etaL, 1989). 

Chimera experiments involving hybrid a,/0 r adrenergic 
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receptors suggested that the third cytoplasmic loop may 
underlie coupling specificity of the adrenergic receptors 
(Kobilka et a!., 1988). The 0,-receptor is positively coupled 
to adenylate cyclase through G s , whereas the on-receptor is 
negatively coupled to this enzyme through Gj. (3,-Adrener- 
gic receptors were generated in which the third cytoplasmic 
loop was replaced by the third cytoplasmic loop of the a r 
adrenergic receptor. Activation of this chimeric receptor; 
which still has /3 2 pharmacology, caused inhibition instead 
of stimulation of adenylate cyclase (Kobilka et aL, 1988). 
Similar results have been obtained for other cationic amine 
receptor hybrids. The dopamine D» receptor is negatively 
coupled to adenylate cyclase, whereas the 0,-adrenergic re- 
ceptor is stimulatory to adenylate cyclase.. Substitution of 
the third cytoplasmic loop of the phospholipase-coupled 
muscarinic mi receptor into the dopamine D a receptor and 
of the same region of the phospholipase-linked ott-adrener- 
gic into the ft-adrenergic receptor caused the resultant chi- 
meras to hydrolyze phosphatidylinositol and mobilize cal- 
cium (Cotecchia et aL, 1990; England et al. t 1991). 

The receptor region of the third cytoplasmic loop most 
important in determining the specificity of signal transduc- 
tion may differ between the muscarinic and adrenergic re- 
ceptors. The signal transduction of a 2 /j3 2 -adrenergic recep- 
tor chimeras, in which short segments of the membrane 
proximal regions of the third intracellular loops and of the 
carboxyl terminus have been exchanged, indicated that the 
segment of the third intracellular loop adjacent to TM 6 is 
most important in adrenergic receptor coupling specificity 
(Liggett et a/., 1991). Substitution of multiple segments 
suggested that all of these domains may cOordinately con- 
tribute to G-protein coupling (Liggett et at., 1991). By con- 
trast, interchange of seven amino acids from the third in- 
tracellular loop adjacent to TM 5 was sufficient to change 
the coupling specificity of a muscarinic mj/mj chimera 
(Kubo et aL, 1988). In another series of experiments, sub- 
stitution of nine amino acids from the amino terminus of 
this region of the third cytoplasmic loop of the muscarinic 
m, receptor into the m 2 receptor conferred a pattern of cal- 
cium release characteristic of m 3 receptor activation (Lech- 
leiter et aL, 1991). 

Phosphorylation of cytoplasmic residues has been iden- 
tified as an important mechanism for the regulation of 
G-protein coupling of some GPRs. The third cytoplasmic 
loop and carboxyl terminus are rich in serine arid threonine 
residues that are potential phosphorylation sites. After ac- 
tivation, both rhodopsin and the 0 a -adrenergic receptors 
are desensitized through the. action of receptor kinases. 
The photoactiyated form of rhodopsin is phosphorylated 
in the carboxyl terminus by a specific rhodopsin kinase 
(Hargrave et aL, 1982). This phosphorylation allows bind- 
ing of the protein arrestin, which interferes with G-protein 
coupling to the opsin. A similar mechanism has been iden- 
tified for the ft-adreriergic receptor, in which 0-adrenergic 
receptor kinase (0ARK) phosphorylates the carboxyl ter. 
minus of the receptor. This leads to binding of a 0-arrestin 
and functional uncoupling of the receptor. 0ARK can also 
phosphorylate the third cytoplasmic loops of agonist stim- 
ulated m« muscarinic and otj-adrenergic receptors, as well 
as the carboxyl terminus of photoactivated rhodopsin 



(Benovic et aL, 1986, 1987a; Kwatra et aL, 1989). Many re- 
ceptors contain cytoplasmic consensus sequences for pro- 
tein kinase A phosphorylation. In the case of the ^-adre- 
nergic receptors these sites play a role in receptor desensiti- 
zation (Clark et aL, 1989). The TSH receptor, which does 
not contain consensus sequences for protein kinase A 
phosphorylation, does not demonstrate agonist-induced 
desensitization (Takasu et aL, 1978). 

Structure/function modeling of the mechanism of G- 
protein activation by GPRs must also account for the re- 
cent identification of G-protein coupled receptors which 
are not members, of the GPR gene family and of peptides 
that are capable of directly activating G-proteins. The se- 
cretin receptor, while distinct in sequence, is predicted to 
exhibit a seven-transmembrane domain structure (Ishihara 
et aL, 1991). Although the metabotropic glutamate recep- 
tor also manifests seven closely spaced hydrophobic do- 
mains, the hydrophobicity profile predicts an additional 
potential membrane spanning domain distant from the 
other seven (Masu et aL, 1991). The activation of hetero- 
trimeric G-proteins has been implicated in the signal trans- 
duction of several membrane receptor tyrosine kinases. 
These related receptors, which bear no overall structural or 
sequence resemblance to the GPR family, include the insu- 
lin receptor, the insulin-like growth factor-II receptor, the 
epidermal growth factor receptor, and the colony stimu- 
lating factor-1 receptor encoded by the c-fms proto-onco- 
gene (Imamura and Kufe, 1988; Nishimoto et aL, 1989; 
Luttrel et aL, 1990; Liang and Garrison, 1991). A 14- 
amino-acid segment of the insulin-like growth factor-II re- 
ceptor, which bears striking resemblance in its charge dis- 
tribution to the amphipathic protein mastoparan and to 
the membrane proximal regions of the third cytoplasmic 
loop of the GPRs, specifically activates the heterotrimeric 
G-protein, G; (Okamoto et aL, 1990; Nishimoto et aL, 
1991). The oncogenic activity of the v-fps protein, a cyto- 
solic tyrosine kinase, may also involve activation of a het- 
,erotrimeric G-protein (Alexandropoulus et aL, 1991). 
GAP-43 is a growth cone protein that activates G 0 . A 
decapeptide domain of GAP-43 which is homologous to 
the membrane proximal carboxyl terminus of many GPRs 
was found to be responsible for association with G 0 (Stritt- 
mater et aL, 1990). Amphiphilic neuropeptides, including 
substance P, ACTH, and bradykinins, can also activate 
G-proteins in a receptor-independent fashion (for review 
see Mousli et at., 1990). Further delineation of the struc- 
tural motifs that mediate G-proiein coupling of these non- 
homologous receptors and peptides would be expected to 
illuminate the mechanisms of. receptor/G-protein interac- 
tion in general. 

GENE STRUCTURE AND EVOLUTION 

Molecular cloning has revealed that a panoply of re- 
ceptor subtypes exist for most of the classical neurotrans- 
mitters. As the basic transmitters developed very early 
phylogenetically (see Walker and Holden-Dye, 1989), the 
subsequent evolution of multiple receptor subtypes served 
the need for greater signaling specificity of progressively 
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more complex nervous systems. Nucleotide sequence anal- 
ysis and analysis of gene structure may elucidate the time 
frame and mechanisms of subfamily and subtype evolu- 
tion. 

The remarkable conservation of the transmembrane do- 
mains of the GPR family proteins suggests that these genes 
may have evolved from a common precursor. A phyloge- 
netic tree of the GPR family, generated by nucleotide. se- 
quence comparison, suggests that the opsins diverged from 
the catecholamines between 1 and 1.5 billion years ago 
(Yokoyama et aL, 1989). The age of the GPR gene family 
is independently suggested to be greater than 1 billion 
years by the isolation of a Dictyostelium chemoattractant 
receptor with structural and sequence homology with the 
GPRs (Klein et aL, 1988). While several seven-transmem- 
brane yeast pheromone receptors have been identified 
(Hagen et aL, 1986; Marsh and Herskowitz, 1988), these 
proteins have little amino acid sequence homology with the 
GPRs. The evolutionary relationship, if any, between 
these yeast receptors, the seven- transmembrane bacterio- 
rhodopsin, and the GPR superfamily remains to be deter- 
mined. 

Many of the GPR genes characterized to date are intron- 
less. Several GPR genes, however, have introns within 
their coding regions. These include the opsins (Nathans 
and Hogness, 1984; Nathans et aL, 1986), the dopamine 
D 2 , D Jf and D 4 receptors (Grandy et aL, 1989; Sokoloff et 
at., 1990; Gandelman et aL, 1991; Van-Tol et aL, 1991), 
the substance P receptor (Hershey et aL, 1991), the sub- 
stance K receptor (Gerard et aL, 1990), the lutenizing hor- 
mone receptor (Frazier et aL, 1990; Tsai-Morris et aL, 
1990), and a Drosophila muscarinic receptor gene (Shapiro 
et aL % 1989). Introns have not been found within the cod- 
ing regions of mammalian muscarinic receptor genes iso- 
lated to date. For the GPR genes with introns, the loca- 
tions of introns near or within the seven transmembrane 
domains, are illustrated in Fig. 4. Introns tend to be posi- 
tioned between TM domains. 



Two mechanisms of gene evolution, gene duplication 
and retroposition, both appear to have played a role in 
generating the complex multiplicity of the GPR family. 
Genes containing introns, like those for the dopamine D 2| 
D„ and.D 4 receptors, most likely evolved from each other 
by gene duplication (Ohno, 1970). This is supported by the 
relative preservation of intron location among these recep- 
tors (see Fig. 4). There is also the preservation of an intron 
site (adjacent to the region encoding TM 3), between the 
dopamine and tachykinin receptor genes and of a different 
intron site (adjacent to the region encoding TM 7) between 
the tachykinin receptor and opsin genes, suggesting that 
gene duplication of a common precursor may have played 
a role in the evolution of these receptors. Most of the re- 
ceptor genes are intronless, raising. the possibility that one 
or more of these arose through reverse transcription of 
mRNA and incorporation into the genome (Brosius, 1991), 
an event referred to as retroposition. Gene duplication 
may have further amplified the number of these intronless 
genes. 

Another potential mechanism for generating function- 
ally distinct receptors is alternative processing of RNA pri- 
mary transcript. Alternative splicing of a free-standing 
exon of the dopamine D 3 receptor gene gives rise to two re- 
ceptor isoforms which differ in the incorporation or ab- 
sence of a 29-amino-acid segment of the third cytoplasmic 
loop. (Grandy et aL t 1989). Although the functional differ- 
ence between the two isoforms remains to be elucidated, 
their biological importance is suggested by the preservation 
of the alternative splice site through at least 80. million 
years of evolution, from mouse to man (Montmayeur et 
aL, 1991). Alternative splicing also gives rise to multiple 
forms of the dopamine D, receptor mRNA (Giros et aL, 
1991; Snyder et aL, 1991) and of the LH/CG receptor. An 
alternative mRNA splice variant of the LH/CG receptor 
encodes a secreted LH binding protein lacking the trans- 
membrane regions (Loosfelt et aL, 1989; Tsai-Morris et 
aL, 1990). 
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FIG. 4. Schematic representation of GPR genes that have introns within the protein coding region. Abbreviations; h- 
D 2 , human dopamine D 2 receptor; r-D 5f rat dopamine D 3 receptor; h-D 4( human dopamine D* receptor; r-SP, rat sub- 
stance P receptor; h-SK, human substance K receptor; h-Rhod, human rhodopsin; h-ops(B), human blue opsin; h-ops- 
(G), human green opsin; h-ops(R), human red opsin; d-M, Drosophila muscarinic receptor. The locations of introns are 
indicated by arrows. 
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Convergent evolution is also evident in G-protein cou- 
pled receptors. The examples of secretin and the metabo- 
tropic glutamate receptor in which apparently unrelated 
genes have evolved similar seven-transmembrarte structures 
and G-protein coupling have already been discussed. Com- 
parison of the nucleotide sequence for the red and green 
visual pigment genes between fish and human indicate that 
the red pigments evolved independently from the green 
pigment through identical amino acid substitutions (Yoko- 
yama and Yokoyama, 1990). 

SUMMARY 

The identification of new GPR genes and the elucidation 
of their binding and G-protein coupling mechanisms will 
undoubtedly continue to accelerate. In particular, the 
binding site and coupling domains of the non-glycbprotein 
hormone peptide receptors have not yet been investigated 
and their study will help illuminate the binding and cou- 
pling characteristics in this family. Although striking prog- 
ress has been made in delineating the ligand binding site of 
rhodopsin and the cationic amine GPRs and the structural 
motifs contributing to G-protein recognition, the actual 
molecular events which transmit ligand binding into activa- 
tion of G-protein remain to be elucidated. More complete 
understanding of the GPR's three-dimensional structure, 
pharmacology, physiology, and anatomy will ultimately 
have a tremendous impact on our understanding of biol- 
ogy and oh the development of pharmaceuticals. 
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